31-03-2025

Supercomputing. Astronomy and Big Databases

Miguel Ángel Aragón
Astronomy as a discipline has been historically limited by observations. Our ancestors made detailed records of a sky where thousands of stars could be identified—this number is decreasing as a consequence of light pollution [see UNAM Internacional 3, p. 108]. Later on, we were able to reach very faint and distant objects thanks to the invention of the telescope in the 17th century. It was now possible to observe not only bright stars but fainter stars, nebulae, and also incomprehensibly distant galaxies.

However, even with the development of more powerful telescopes and the construction of large observatories around the world during the 20th century—including several in Mexico—the amount of astronomical observations remained relatively low. Astronomy was still a science of little data. This changed in the last decades with the development of high-sensitivity and high-resolution detectors, as well as robotic telescopes able to make automatic observations, therefore making large observing campaigns possible.

Modern astronomy faces an unprecedented challenge—the handling, curating, storage, and analysis of huge amounts of data generated by ground- and space-based telescopes, as well as by numerical simulations. Supercomputing has emerged to undertake this challenge as a vital tool that allows efficient analysis and processing of this data by using parallel analyses that execute simultaneously in supercomputers. Supercomputers—which can be as large as a storage room—receive and analyze data from telescopes and simulations, automatically identifying, characterizing, and cataloguing celestial objects. This is a monumental task that would be impossible to do manually.

One example is the Sloan Digital Sky Survey (SDSS) [see p. XX in this issue– ref art Eduardo Méndez], which involves about a hundred institutions with UNAM being a very active member both in technological development and in scientific analysis. This project has mapped a third of the sky with a robotic telescope, producing a catalog of 530 million celestial objects, including over a million galaxies and their measured distances.

The Dark Energy Survey (DES [see p. XX in this issue])—in which UNAM collaborates by developing data analysis techniques along with dozens of institutions from all over the world—consists of a very deep map that covers half of the sky. The image of the sky DES took contains ten quintillions of pixels (equivalent to a million pictures taken with a mobile phone), generating a petabyte—one thousand trillion bytes—of data. This map required a hundred million CPU hours—one CPU hour is equivalent to a computer hour in a personal computer—for its initial analysis. When complete, DES will have generated a catalog with 35 million galaxies.

On the computing front, modern cosmological simulations can recreate synthetic universes that are later “observed” with virtual instruments. These simulations execute on supercomputers with tens or hundreds of thousands of processing units, each one equivalent to a personal computer. Cosmological simulations emulate the evolution of the Universe from little after the Big Bang to the present time. They provide a detailed history of the formation and evolution of hundreds of thousands of galaxies while generating several petabytes of data along the way.

UNAM researchers recently simulated to great detail a synthetic galaxy like the Milky Way as a part of the international project Assembling Galaxies Of Resolved Anatomy (AGORA) in which 60 international institutions participated. This simulation was executed in UNAM’s Miztli supercomputer. It took 1.5 million CPU hours.

The amount and complexity of astronomical data require smart forms of storage. Current storage systems are distributed—big volumes of data are split and saved in various servers. This allows for efficient and quick access since many servers can simultaneously read and analyze different parts of the data. A promising combination of supercomputing and distributed storage are grids, in which different computing and storage systems that can be physically apart by long distances, interconnect and form a network. UNAM recently started a grid that connects three UNAM’s academic entities through a high speed network: the General Office of Information and Communication Technologies (DGTIC), the Laboratory for Models and Data (LAMOD) of the institutes of Astronomy (IA) and of Nuclear Sciences (ICN), and the Institute of Atmospheric Sciences and Climate Change (ICACC). This grid will be able connect with institutions abroad in the future, which will increase the shared computing capacity.

As these examples show, astronomy today is dominated by data. The increasing amount of available information will bring new challenges, demanding the development of innovative techniques for its analysis and handling. In this sense, recent developments in artificial intelligence promise a new era in supercomputing and databases in which autonomous artificial intelligence agents will continuously scan databases to curate information, interact with researchers, make complex analysis, and potentially discover new patrons that lead to scientific findings.
Miguel Ángel Aragón obtained his PhD in astrophysics at the University of Groningen, Netherlands. After that he had research periods in Johns Hopkins University and in the University of California-Riverside, United States. He is a senior researcher at UNAM’s Institute of Astronomy. His research focuses on galaxies formation and evolution, the cosmic network, AI and new technologies in human-machine interaction.
Current issue
Share:
   
Previous issues
More
No category (1)
Encuadre (23)
Entrevista (2)
Entérate (8)
Experiencias (3)
Enfoque (1)
Contenidos complementarios (1)