German Astrophysical Virtual Observatory
 
 

Astronomical archives

One of the main tasks of the GAVO project is to assist the German community in making their data archives available to the community in a way that is compatible with the efforts going on in projects in other countries, thus to establish a truly global VO. Many of these archives have been, or are being created as the result of large scale observational projects the German community has been and is involved in. A special aspect of the GAVO project is also the attention it gives to the results of theoretical work, for example large scale numerical simulations. The type of the data is different from observational archives, but it is the goal to try to find ways to incorporate these archives in the VO that is compatible to observational results, which is a prerequiste for a succesful implementation of the theory/observational interface. There are various aspects to "making archives available" and GAVO has investigated a number of these. In particular GAVO has worked on aspects of archive implementation, modeling, publication, access and federation. Below we will define in more detail what we mean by these terms and how GAVO has approached them.

Archive implementation

Many astronomical dataset are still stored in files, often using a format that is specially designed for the particular archive. The disadvantage of this is that files offer only limited access methods, often requiring large parts of the datasets to be transferred to a local machine before users can access them, which in turn requires custom code to be written for extracting the data one is actually interested in. This method becomes unwieldy when datasets become very large and furthermore requires users to understand the details of the specific formats that are used. It does not lend itself well for automated discovery and retrieval. GAVO has explored alternative implementations of archives, concentrating on relational database technology. The great advantage of this is that there is a uniform access protocol through the query language SQL. Relational database management systems (RDBMs) offer advanced methods for performance tuning, for updates, for backups, for links between datasets etc. It offers an advanced theory and support for structuring data sets that are more advanced than a single tabular structures.

Archive publication

Under publication we understand the process of making an archive available to the astronomical community in VO compatible manner. This includes creating web based services for accessing the database as well as making it possible to discover these services. The process of discovery is researched within the IVOA registry working group. GAVO has been involved in the registry and data modeling working groups.

Archive federation in the VO

One of the important goals of the Virtual Observatory is to enable the federation of distributed astronomical archives. This means that users of the VO should be able to query for all kinds of astronomical data in a uniform way, without requiring intimate knowledge of the way data is stored and organized in individual archives. It must become possible to extract and compare data from different archives in a single query, just as currently data from a particular archive can be queried.

Currently there are already many astronomical archives that provide online access to their data. The problem for the VO effort is that each of these archives has different ways of doing so. Data products that are in principal quite similar, as far as content is concerned, are stored in different formats. The access to the individual archives is provided mainly through web interfaces that are by nature confined to the particular archive as well.

The situation presented to astronomers wishing to extract and combine data from multiple archives is somewhat like that illustrated in Figure 1.


Figure 1: Babylonian confusion. The current situation leads to a total effort scaling as the product of the number of users (N) with that of the number of archives (M).


Each archive can be thought of as having defined its own language for describing the data it contains. A prospective user must learn all the different languages to be able to query the archives and translate the data into a single format for further analysis. The total amount of work involved scales therefore with the product of the number of users and the number of archives.

The VO aims to provide a solution to this problem that can be compared to defining an "Esparanto" for astronomy. As illustrated in Figure 2, the VO aims to provide a common interface for archives to publish their data and for users to query this data.


Figure 2: The VO as "Esparanto". By defining a single standard for querying and publishing data the total effort scales linearly, as N+M.


The archives must go through the effort of translating from their language to the common "Esparanto", and users need to learn this one language for posing their queries and interpreting the results.

The IVOA has defined various working groups to define this common interface. The ones that have so far created the most tangible results, and which GAVO has been working with in its publication efforts, are the Data Access Layer (DAL) and the VOTable groups.

The DAL group is concerned with the way archives can be accessed. It has defined a small number of simple, HTTP-based query protocols that can easily be implemented by the archives. These are the Simple Cone Search (SCS), the Simple Image Access (SIA) and, more recently, the Simple Spectrum Access (SSA) protocols. Using only a few query parameters these protocols allow users to retrieve sources, images or spectra from a particular part of the sky, identified by a position in J2000 equatorial coordinates and a search radius.

The protocols stipulate that the returned data should be contained in VOTable format. This is an XML document following a formal XML Schema specification, which allows the storage of the description of the data (names, types, meaning of columns) together with the data itself. Such a formal specification permits to create tools for interpreting, displaying and manipulating the data without requiring knowledge of how the data is actually stored in the archive. Note that this VOTable format was designed as being an ASCII/XML representation of the FITS Binary Table format. The FITS format is not made obsolete by VOTable, in fact FITS files can be embedded into a VOTable document, either in an ASCII encoded form, or as a link from the document.

Data modeling

GAVO has been an active participant in the IVOA data modeling standardization process. In particular GAVO has been involved in discussions about and publication of a data model on Observation and Quantity, published on the data modeling pages of the IVOA website. Participation in a standards process does not imply we have to wait until standards are defined before we start using a data model in the software used by GAVO itself. GAVO is therefore pursuing work on a model that can support the GAVO specific requirements already now, while at the same time taking care not to stray from the IVOA roadmap (see link).

GAVO's approach to the model is aimed to a somewhat more generic set of applications than the IVOA models have been so far. Pursuing the "Esparanto" metaphor from a previous section somewhat further, the data model defines the syntax and vocabulary of the common language underlying all applications of the VO. The approach we have taken is that we need to build what is often called an ontology.

It also corresponds to what in many software development methodologies is called an analysis model. Most standard software development methodologies such as described in the references introduce an analysis phase in the development cycle, during which a conceptual model is developed for the problem domain. We will call such a model a domain model, to distinguish it from more specialized data models, designed for particular applications or for implementation purposes.

To build the VO, considerable amounts of software will have to be written for connecting users to astronomical archives, for federating archives and for providing services on top of these federated archives. In this effort an analysis phase resulting in a conceptual model is therefore not out of place, in fact required.

In [5] a proposal for such a model is presented. Part of that model is used in a prototype for publishing theoretical simulation data through GAVO, which is described elsewhere on this site.

References

  1. Booch, G., Object-oriented Analysis and Design. second edition ed. 1994: Addison-Wesley.
  2. Fowler, M., Analysis Patterns. 1997: Addison-Wesley.
  3. Halpin, T., Information Modeling and Relational Databases. 2001: Morgan Kauffman.
  4. Meyer, B., Object-oriented software construction. second edition ed. 1997: Prentice-Hall, Inc.
  5. Lemson, G., P. Dowler, and A. Banday. A unified domain model for astronomy. In ADASS XIII, 2004. 2004. Strasbourg, France: ASP Conf. Series.