Every astronomer knows the critical importance of good data, but few observatories have recognized the importance of applying good data science to their operations.
Credit: ALMA (ESO/NAOJ/NRAO)
Observatories must be at the forefront when it comes to data acquisition and storage, as astronomical research depends on lots of data, and observatories are often in remote and relatively inaccessible locations. Operating an infrastructure that supports data acquisition, data storage, and facilities (e.g., power generation) is paramount.
But without basic analytics and data science architectures to support these endeavors, astronomers, engineers, and technicians are left without easy access or understanding of their data and must instead rely on instinct and acquired expertise. While this works, there is little room for improvement or efficiency gains, and it’s challenging to collaborate with other colleagues.
The Atacama Large Millimeter/submillimeter Array (ALMA), faces similar data challenges to other observatories, but on a larger scale. ALMA is a radio telescope composed of 66 high precision antennas located on the Chajnantor Plateau, 5000 meters altitude in northern Chile, and it is recognized as “the most complex astronomical observatory ever built on Earth."
This massive array has a collecting surface of over 6,492 square meters, which allows it to provide about 10 times the resolution of the famous Hubble Space Telescope. To do it so, ALMA produces up to 5 TB of scientific data every day, equivalent in scope to ~ 2500 hours of video.
Credit: EHT Collaboration
ALMA looks at the oldest and coldest things in the Universe, and was a key contributor to the creation of the black hole image. The image on the left is the full image, while the center image is without any of the Chilean (ALMA-APEX’s) data. The image on the right shows what the image would look like without the second largest contributor; it’s clear that ALMA’s observations were critical for the creation of this groundbreaking image and brought us “the donut.”
ALMA’s groundbreaking research, such as the statistical surveys on the birth of planets (see image below) like HL Tau, are partially due to its state-of-the-art equipment, but also rely on the statistical approach of ALMA scientists. These statistical processes can also be applied not only to the science results themselves but to the activities of science data collection.
Integrating Data Science
Over the last two years, ALMA has worked with Dataiku to create a “modern data science architecture, that will enable us to use advanced analytics and data science, first to improve our operations efficiency, and later over larger, more complex scientific data sets,” said Ignacio Toledo, ALMA Data Science Initiative Lead.
ALMA has been able to make the shift from basic Key Performance Indicators (KPIs) to asking bigger questions about the process surrounding their data initiatives. Open collaboration has enabled scientists and engineers to establish best practices and decrease repeated tasks.
“ALMA has eight years of housekeeping data collected and every action on the telescope is logged. This gold mine of information enables us to progress in our goal to develop sophisticated prescriptive analytics for our operations”, points out Jorge Ibsen, Head of Computing at ALMA. Now that the full data pipeline, with all its varied technical elements and processes, is connected thanks to Dataiku, different stakeholders are able to add their expertise where it’s needed, whether they’re a data scientist, engineer, or an astronomer.
ALMA’s high-resolution images of nearby protoplanetary disks, which are results of the Disk Substructures at High Angular Resolution Project (DSHARP).
Credit: ALMA (ESO/NAOJ/NRAO), S. Andrews et al.; N. Lira
This culture shift has led to huge improvements in efficiency at ALMA, and has resulted in their incorporation of a new core value to their mission statement: “As a scientific organization, we make our decisions, create solutions and organize our work on data-driven analysis and the facts that support them.”
“Dataiku is allowing us to explore new optimization avenues," said astronomer Sergio Martin, manager of program management group at ALMA. “It provided us with a tool that helped a lot of people start talking the same language, and understand a way to work with data beyond just software consumption,” added Ignacio Toledo.