Tuesday, September 29, 2009

Statistical Frontiers of Astrophysics



This week IPMU (Institute for the Physics and Mathematics of the Universe), in the spirit of bringing together mathematicians and physicists, is hosting the focus week Statistical Frontiers of Astrophysics. Even though I am neither an astrophysicist nor a statistician, I am attending part of the lectures. (After all, looking at the stars was a favorite pastime of mine when I was little.) Whereas in my field, we are forced to come up with theoretical models in the absence of experimental data, the problem of astrophysicists consists in extracting meaningful information from a giant data collection.

Astronomy is an observational science. Unlike in my own field, an enormous amount of data is available to the astrophysicist.

The times when astronomers pointed their telescopes at the night sky and and cataloged by hand what they saw are over. In recent years, very potent instruments to measure what’s up in the sky have become available. The one best known to the public is probably the Hubble Space Telescope.

WMAP 5-year full survey. Image: NASA/WMAP Science Team

But also the Wilkinson Microwave Anisotropy Probe (WMAP) has been collecting invaluable data for cosmologists. Another mammoth project is the Sloan Digital Sky Survey (SDSS). Its final dataset includes 230 million celestial objects.

Of course the goals of the astrophysics community go beyond capturing amazing images of far away galaxies and nebula.

Redshift 5.74 Quasar. Image: Stephen Kent, SDSS Collaboration

A question they might ask when presented with an image of the sky is for example “How many of those light points visible are quasars (a type of highly energetic distant galaxy)?” They have to decide whether a bright point on their image is just a normal star, a normal galaxy, a cosmic ray that struck the sensor by chance, or the quasar they were looking for. And they have to answer this question for millions of objects! It is clear that powerful automated methods are needed for such a task.

Statistics is a tool for extracting information from data. In most measurements, the thing you are looking for (the signal) is drowned in a sea of noise. You somehow have to extract the information that is meaningful to you and discard the rest. Many measurements are also fraught with errors. Measurements of earth bound telescopes for example are subject to air movements in the atmosphere. A fluctuation you have recorded might simply be the result of a gust of wind.

One statistical method commonly used is the one of Bayesian inference. It is a method to adjust the likelihood of a certain event to happen based on a hypothesis and a measurement. It essentially corresponds to using common sense in assessing your hypothesis based on new evidence. The problem with this method is that it depends on the hypothesis you started with. Depending on how far off you are from reality, Bayesian inference might lead you to wrong conclusions.

And unfortunately, there are examples of bad practice in the use of statistics in data evaluation. It has been remarked repeatedly at this conference that most wrong results in astrophysics can be traced to misuse of statistics.

This is part of the reason this conference exists. Here, statisticians, i.e. pure mathematicians, meet with the people who have to extract information from real life data with all its problems, with the people who actually have to use the methods mathematicians come up with. The hope is that this workshop will spark discussions and initiate new collaborations to tackle these problems.

One very cool instance of automated processing of astronomical images is given by Astrometry.net. It is an engine that takes any image of the night sky and returns its world coordinates, along with an identification of all the known astronomical objects that are in the picture. The engine uses a powerful kd-tree data structure along with a huge astronomical catalog and Bayesian decision making to identify the portion of the sky pictured. Like this, the picture is calibrated and becomes thus usable as a base for scientific research.

This project makes use of Flickr, the popular photo sharing site. Any picture of the sky submitted to the pool of the astrometry group on Flickr gets run through the engine, which posts a comment to the picture with the astronomical data and adds notes directly on the image to identify the visible objects. Take a picture of the sky and have it calibrated by astrometry.net and thus turned into a potential resource for research. How cool is this!

Top picture: Star-Forming Region LH 95 in the Large Magellanic Cloud. Source: Hubble Space Telescope, D. Gouliermis

No comments:

Post a Comment