Purpose and scope¶
Purpose¶
This is the Data Product Description Document. It aims at providing a comprehensive list of all the data products that are created and manipulated in the context of the Science Ground Segment data processing pipeline (or “SGS pipeline” in short), accompanying this list with information on the data products themselves.
In the present context a Data Product, or product, is a complex data structure that may contain different type of values, keyword-based information, and/or meta-data. While no strict definition of a product can really be made, these are defined by the developers of the processing function, and they are the basic objects that the EAS will handle, store and reference.
The DPDD should be viewed as an encyclopedia of Euclid products. There is one single entry per products, and the products are grouped by the PF that produces them. We envisioned two ways to use the DPDD:
As an encyclopdedia, where a user will find a product in the EAS and look for informations about it in the DPDD
As a manual where the user will browse through the products of a given PF to determine which best suits he needs for his research.
Historically, this document was first used as a support for the developpers of the SGS pipeline to understand the products coming from the other PF that they are using in their own PF. It is also now used as a reference documents about roducts for the general science users of Euclid. While developpers work on the DPS, science users are accessing the archive through the SAS. The SAS only contains a subset of the profusion of data products generated by the SGS. Moreover, these products are often stripped of a lot of their metadata, only keeping the core FITS tables or images. Nevertheless, the DPDD is supposed to be comprehnsive, so all bits of a product should be documented here, especially if they are in the SAS. Also, if users belive some products not included in the SAS should be distributed, please let the SGS scientist know so their inclusion can be discussed with ESA in future releases of the SAS.
DPDD is a companion document to the SGS Common Data Model, a software component that defines the structure of the data products used by the SGS Pipeline. It is also an important companion to RD5 (see Related documents). RD5 provides a comprehensive yet synthetic view of the interfaces between the different elements of the data processing SGS, while the present document details the products that form the interface.
Scope¶
Ultimately this document will contain a section for all the products of the SGS Pipeline. The version 1.1 contains all the main SOC, IOT products, as well as most of the LE1, LE2 and LE3 products. Some products are still undocumented, they are listed in the section listing the missing products.
For each product, we provide a detailed description of the product’s content, to allow in-depth understanding of the product content, as well as information on where the product is created, and where it is going to be used. We present this in the form of data product “cards”, that standardize the organization of the information.
As will be evident in browsing this document, there is a large number of data products created and handled by the SGS pipeline. This is expected for such a deep processing system. Given that the development of the pipeline is distributed (i.e. the different Processing Functions are developed by independent groups of scientists and developers), there is an intrinsic heterogeneity in the system. The Euclid Consortium as it is built does not have the mean to fully control this heterogeneity (by providing for instance centralized development of given classes of data products), however we have put in place some measures to try an maintain it to an acceptable level.
For instance the implementation of a common data model allows a central team (the System Team) to maintain an overview of all the products that are used by the pipeline. This way the System Team can define harmonization rules on the general structure of products as well as on their names. This guarantees that a user of Euclid data only has to master a single naming convention to navigate and understand the collection of Euclid data products. We regularly revise the compliance of data product names with the convention. This supervision of product creation and naming also results in a classification of the products in generic families (namely: RawData, CalibratedData, StackedData, SourceCatalog, CalibrationData, PSF, PFConfiguration, and Miscellaneous). This families provide another entry to make sense of the vastness of Euclid Data Products. The distribution of data products involved in the Science Challenges 4,5 and 6 are presented in Table 1 to illustrate this point.
Of course, ensuring consistent naming principles does not guarantee that the inner structure of produces will be consistent. This would be helpful for users but this deep harmonisation principle cannot be fully enforced. First, some data products, even if they belong to the same family (e.g. images) hold data that is still very close to the instrument that produced them, and therefore the structure of the product is generally tailored to some organisation principle of the instrument (e.g. detector, quadrant of detectors, readout principle…). Second, and pragmatically, the Euclid Consortium organization does not always provide resources to implement such a deep harmonisation principle. Therefore this has to be focused to products that have the highest interest for the users. A quick survey shows that these would likely be the coadded images of the different photometric bands of the survey, the Euclid source catalog (containing photometric, spectroscopic, morphologic, redshift information) and the high-level science products at the end of the pipeline. For the coadded images, one method to ensure some structural consistency is to have them produced by a single processing function. This is the current plan for the SGS pipeline where the reference coadded images are producted by the MER processing function using a tiling scheme that is defined once and for all. For the catalog which is contributed to by a number of actors, the System Team is exerting scrutiny on all developments that affect its internal structure. Finally the high-level science products are all the result of the LE3 PF and we can thus rely on that centralization to produce consistency.
Finally we also remind that most of the users of Euclid data will access it through the Science Archive System (SAS) which is a component of the EAS. Data products will be pushed in the SAS following a decision by the SGS (i.e. which generation/family of products should be pushed to the SAS) and we have the possibility of implementing some refactoring of the data products when pushing them to the SAS. It is already envisionned that not all the metadata attached to the Euclid products will be pushed to the SAS. This is still under study but extends our capacity to provide consistency to the data products structure while maintaining the flexibility that developpers in the SGS need. For more information on the SAS organisation and principles, refer to RD7 (see Related documents).
Method¶
As the Data Products can evolve with the software, this document is a living document that follows issues of the SGS pipeline. The version of the pipeline shall be identified clearly in the data product description pages. The document is build with the Sphinx document software that aggregates elements produced by the different contributors in the SGS, maintained in configuration control at the Euclid GitLab. If you are reading this document on the web, you probably have access to the most up-to-date version. If you are reading a PDF version, it is possibly outdated or corresponding to a release of the SGS codes.
The choice of using Sphinx to build this document, i.e. to depart radically for the current docx/pdf based used elsewhere in the SGS is motivated by the desire to really make this a living useful document. docx/pdf documents are essentially static, they are cumbersome to maintain up-to-date (especially large ones), they cannot be automatically processed for information. A document that is supposed to serve as a reference on the contrary must be dynamic (i.e. adjust to changes in the system it describes), easy to update whatever its size, and its content must be “processable” by software tools. This is precisely what we can do with Sphinx. The document is in fact a directory structure, with each section a folder, contents of the sections are ascii files in these folders, Adding a section to an existing document is a straigthforward modification of the structure, and this folder structure makes it possible for each section to be under the responsibility of a different person while maitaining at all time the capacity to rebuild the entire document. Furthermore Sphinx is used reSTructured text, and allows for markup signs. Therefore it is extremely easy to write the content of the document in such a way that it can be automatically analyzed.
Data Product Categories |
EXT |
LE1 |
VIS |
NIR |
MER |
SIR |
SIM |
SHE |
PHZ |
SPE |
|||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
EXT KIDS |
EXT DES |
EXT LSST |
EXT COMMON |
||||||||||
RawData |
|
|
|
|
|
||||||||
|
|
||||||||||||
|
|
||||||||||||
|
|||||||||||||
|
|||||||||||||
|
|||||||||||||
CalibratedData |
|
|
|
|
|
|
|||||||
|
|||||||||||||
StackedData |
|
|
|
|
|
|
|||||||
SourceCatalog |
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
||||||
|
|
|
|
||||||||||
|
|
||||||||||||
|
|||||||||||||
|
|||||||||||||
|
|||||||||||||
|
|||||||||||||
CalibrationData |
|
|
|
|
|
|
|
||||||
|
|
|
|
|
|
|
|||||||
|
|
|
|
|
|||||||||
|
|
|
|
||||||||||
|
|
|
|
||||||||||
|
|
|
|||||||||||
|
|
|
|||||||||||
|
|
|
|||||||||||
|
|
||||||||||||
|
|||||||||||||
|
|||||||||||||
PSF |
|
|
|
|
|||||||||
PFConfiguration |
|
|
|
|
|
|
|
||||||
|
|
||||||||||||
|
|||||||||||||
|
|||||||||||||
Miscellaneous |
|
|
|
|
|
||||||||
|
|
||||||||||||
|
|