Work Areas and Projects

In GAVO’s pilot phase, work is concentrating on four main areas: archive technology and publication, data mining and knowledge discovery in federated astronomical archives, theory in the Virtual Observatory, and Grid-computing. The overarching goal is to support the process of scientific discovery in the era of huge distributed astronomical databases.

Archive technology and publication

The first step in the process of archive publication consists in making the data available on-line. In its simplest form, this means to offer one or more downloadable files.

As a next relevant data may be stored in a database. In case of large data volumes the database management system needs to be augmented by special indices that allow rapid retrieval. For instance, the database containing the 100 million RASS photon events, published by GAVO, is internally using the HEALpix indexing algorithm.

In order to conveniently access an archive, it needs to be equipped with a suitable query mechanism. GAVO has implemented e.g. an IVOA standard cone search service for the ROSAT archive, and a more versatile SQL-based Web service for a simulation archive.

Finally, the existence of a queryable archive needs to be advertised to the public both by informal human-readable descriptions that can be found via general search engines, and by submitting a formal entry to a registry that can be interrogated by computers. The registry is one of the places where metadata describing an archive is being stored.

Data Mining

Once archives are published (i.e. on-line), equipped with query services, and can be found through a registry or by other means, they need to be "federated." In case of photometric catalogues federation is accomplished via cross-matching. The typical next step would often be an assembly of spectral energy distributions, possibly followed by a classification process. The analytical software tools accomplishing these tasks may have to be complemented with visualization tools.

GAVO has engaged itself in all these work areas, producing along the way the ClassX matching and classification suite, and the SED classification suite. The latter is also an example for a tool allowing the astronomer to compare theory with observations, which is the work area discussed next.

Theory in the Virtual Observatory

While most VO efforts are concentrating on observational archives, GAVO is especially interested in the theoretical component. This comprises the publication of theoretical datasets in similar ways as their observational counterparts as well as the creation of services with a more theoretical flavour.

The ultimate goal consists in creating an environment in which, on the one hand, theoretical results can be used for the interpretation of observations, and on the other hand, observations can be used to constrain theoretical models. A case in point for the former is the SED classification suite mentioned above.

GAVO is pursuing a number of concrete projects, and, through collaborations, is exploring techniques for publishing theoretical datasets. GAVO is also actively involved in the IVOA theory interest group, which e.g. aims at channeling the requirements from the theory community into the IVOA standards process.

Grid-computing

Grid-computing has both, a community aspect and a technology aspect. Above all it will allow an easy formation of "virtual organizations" across multiple institutions. In practice these are the distributed research groups, commonly found in modern astronomy. To these virtual organizations, grid-computing promises to offer coordinated, secure, seamless, and transparent access to scalable, distributed "autonomous" computer resources, such as compute clusters, storage space, services and instruments – all connected via high-bandwidth network links. This is supposedly accomplished "hasslefree", i.e. without having to install personal accounts on all the computers used.

Grid-computing promises considerable advantages for all sorts of compute- and data-intensive tasks, most notably data analysis workflows. A case in point is the ClusterFinder application, already running on the GAVO-Grid. Grid-computing will also be useful for generating large-scale simulations, and for comparisons between theory and observations.