****************** Database Concepts ****************** .. _`EbasDatamodel`: The EBAS data model =================== The model presented here is very simplified version of the technical design data model of EBAS and shows only the basic concepts in a user understandable way. Basic knowledge of the data model will help the user in the daily work with metadata and data. .. _`EBASDatamodelOverviewFigure`: Overview -------- .. image:: FIGURES/EBAS-3-user.png Classes of metadata ------------------- .. _`EbasDatamodelMasterData`: Master data ^^^^^^^^^^^ Master data entities are shown in *green* in the :ref:`overview figure <EBASDatamodelOverviewFigure>` above. Master data are general metadata that are referenced by other metadata entities. Master data should not change over time (changes are at least change very seldom and are not considered as change of the metadata, but as a correction). :ref:`Historic states <EbasHistory>` of master data are *not* preserved and historic extracts will always produce the latest state of master data. A typical examples are Station metadata, which are referenced by datasets. The station metadata (Station name, position, altitude, ...) are static and need to be the same for all measurements performed at this station. Changes, e.g. in the Station position would be corrections (if a station was physically moved, a new station needs to be created and referenced). The same applies for organisation metadata. The vast amount of master data are controlled vocabulary (e.g. Statistics code, Instrument type, Component name, ...). Those master data entities are *not* shown in the :ref:`overview figure <EBASDatamodelOverviewFigure>` in order to keep the figure simple. .. _`EbasDatamodelStaticMetadata`: Static metadata ^^^^^^^^^^^^^^^ Static metadata entities are shown in *blue* in the :ref:`overview figure <EBASDatamodelOverviewFigure>` above. Some metadata are considered to be immutable. Rather then changing those metadata, the entities have to be deleted and recreated. Examples for static metadata are submissions and the dataset core metadata. .. _`EbasDatamodelHistoryAwareMetadata`: History aware metadata ^^^^^^^^^^^^^^^^^^^^^^ History aware metadata entities are shown in *red* in the :ref:`overview figure <EBASDatamodelOverviewFigure>` above. History aware metadata keep the full history of changes in the database. See :ref:`EbasHistory` for more information. .. _`EbasDatamodelTimeDependentMetadata`: Time dependent, history aware metadata ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Time dependent, history aware metadata entities are also shown in *red* in the :ref:`overview figure <EBASDatamodelOverviewFigure>` above. In addition to being :ref:`history aware <EbasDatamodelHistoryAwareMetadata>`, those metadata are always valid for a specific data interval of the timeseries and can have different values for different data intervals. An example could be the *detection limit*: Different detection limits can be reported in submissions for different years (*time dependent*). Additionally, the detection limit for one specific year can be changed afterwards and the historic value before the change will still be available in the database (*history aware*). Entities -------- .. _`EbasDatamodelDataset`: Dataset ^^^^^^^ The central part of the EBAS data model is the *dataset*. A dataset represents all metadata and data for one specific measurement variable over time. .. _`EbasDatasetHomogeneity`: .. topic:: **Homogeneity of datasets:** A dataset is homogeneous in the sense that data from different measurement intervals within the whole dataset are comparable and without incontinuities caused by changes in instrument configuration or method. A dataset consists of: * :term:`Dataset setkey`, a unique identifier for the dataset. * :ref:`EbasDatamodelDatasetCoreMetadata`, which define the identity of the dataset. * :ref:`EbasDatasetAdditionalMetadata`, which are mutable. * :ref:`EbasDatasetTimeDependentDatasetMetadata`, which are mutable and can have different values for different data intervals. * References to station, laboratory, field instrument metadata, laboratory instrument metadata and QA metadata Datasets refer to other metadata entities. Those referred entities are uniquely defined (e.g. all station metadata will be the same for all datasets referring to the same station). * Measurement data (time series) .. _`EbasDatamodelDatasetCoreMetadata`: Dataset core metadata ^^^^^^^^^^^^^^^^^^^^^ Dataset core metadata define the identity of a dataset. The dataset core metadata bind also a :term:`dataset setkey`. Two datasets with identical core metadata would be indistinguishable and may not exist in parallel. As the core metadata identify the dataset, they may never change in the lifetime of a dataset. Thus the dataset core metadata are implemented as a :ref:`static metadata entity <EbasDatamodelStaticMetadata>`. Core metadata are: * :term:`Station code`, which refers to the *station metadata entity* * A reference to the :term:`measurement parameter <Parameter>` by the triple: * :term:`Regime code` * :term:`Matrix name` * :term:`Component name` * :term:`Instrument type`, which is controlled vocabulary * :term:`Instrument reference` which refers to *instrument metadata* * :term:`Method reference`, which refers to *method metadata* * :term:`Statistics code`, which is controlled vocabulary * :term:`Resolution code` * any :ref:`EbasDatamodelDatasetCharacteristics` .. _`EbasDatamodelDatasetCharacteristics`: Dataset characteristics ^^^^^^^^^^^^^^^^^^^^^^^ Some :term:`parameters <parameter>` in EBAS need additional metadata to describe the quality of the variable. This additional metadata are called characteristics, as the describe special characteristics of a :term:`parameter`. Dataset characteristics are part of the :ref:`EbasDatamodelDatasetCoreMetadata` and may *not* change in the lifetime of a dataset. Thus they are implemented as :ref:`static metadata <EbasDatamodelStaticMetadata>`. Examples for characteristics are: * Wavelength for nephelometer measurements: The parameter: * :term:`Regime code`: ``IMG`` * :term:`Matrix name`: ``aerosol`` * :term:`Component name`: ``aerosol_light_scattering_coefficient`` needs one more metadata element to describe the parameter of measurement: * Wavelength: Nephelometers measure light scattering in different wavelength. Some nephelometers measure the scattering at 3 wavelengths. Thus they report 3 variables with the same parameter, but with different characteristics (e.g. ``Wavelength=450nm``, ``Wavelength=525nm`` and ``Wavelength=635nm``) * Size bin for dmps measurements: The parameter: * :term:`Regime code`: ``IMG`` * :term:`Matrix name`: ``aerosol`` * :term:`Component name`: ``particle_numer_size_distribution`` needs one more metadata element to describe the parameter of measurement: * Median size (``D``) or * Minimum (``Dmin``) and maximum (``Dmax``) size of the size bin: DMPS instruments measure the particle concentration in different size bins. The number concentration in each size bin is reported as one variable. Thus the size bin needs to be specified by the above mentioned characteristics. .. _`EbasDatasetAdditionalMetadata`: Additional dataset metadata ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Additional dataset metadata can change historically through the lifetime of a dataset. Changes are considered as corrections or additions (metadata were not known before). However, those metadata are not time dependent and need to be constant over the whole time series (changes in those metadata over time would break the :ref:`continuity criteria <EbasDatasetHomogeneity>` of a dataset and the creation of a new dataset is indicated). Additional dataset metadata are implemented as :ref:`history aware metadata <EbasDatamodelHistoryAwareMetadata>`. * External laboratory (performing the analysis) * :term:`Data level` * :term:`Standard method` * Filter medium, coating and/or solution * Inlet type * Humidity/temperature control * The standard conditions the measurements are based on (standard temperature, standard pressure) .. versionadded:: 3.01.00 following attributes were added: * Absorption cross section * Sensor type .. _`EbasDatasetTimeDependentDatasetMetadata`: Time dependent dataset metadata ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Time dependent dataset metadata are dataset metadata which can have different values for different time intervals of the time series. Additionally they have full history support. Thus they are implemented as a :ref:`time dependent, history aware metadata entity <EbasDatamodelTimeDependentMetadata>`. * Statement about occurrence of zero or negative values * Sample preparation * Balnk correction * Detection limit * Uncertainty (relative or absolute) * Calibration standard ID * Inlet description (free text; *inlet type* is defined in :ref:`EbasDatasetAdditionalMetadata`) * Humidity/temperature control description (free text; *Humidity/temperature control* is defined in :ref:`EbasDatasetAdditionalMetadata`) * Measurement latitude * Measurement longitude * Measurement altitude * Measurement height * Orig. time res. * Sample duration * Comment .. versionadded:: 3.01.00 following attributes were added: * Upper range limit * Secondary standard ID * Inlet tube material * Inlet tube outer diameter * Inlet tube inner diameter * Inlet tube length * Maintenance description * Zero/span check type * Zero/span check interval * Flow rate * Filter face velocity * Exposed filter area * Filter description * Filter prefiring (prefiring codeword, temperature, time) * Filter conditioning (yes/no, temp, RH, time) * Artifact correction * Artifact correction description * Charring correction * Water vapor correction * Ozone correction .. _`EbasDatamodelInstrumentMetadata`: Instrument metadata ^^^^^^^^^^^^^^^^^^^ The instrument metadata are composed of * :ref:`EbasDatamodelInstrumentCoreMetadata` * :ref:`EbasDatamodelTimeDependentInstrumentMetadata` .. _`EbasDatamodelInstrumentCoreMetadata`: Instrument core metadata ************************ Instrument core metadata are composed of: * :term:`Instrument reference`, a unique identifier for individual instruments. The instrument reference is already syntactically composed of * :term:`Laboratory code` and * :term:`Instrument name` * :term:`Instrument type` is stored to make sure, the same instrument (same instrument reference) is always of the same instrument type .. _`EbasInstrumentNaming`: .. note:: **Instrument naming** Choosing the instrument name is not always straight forward. Especially when changing instruments (e.g. using a new instrument model, or the same model with a different serial number), it can be difficult to decide about the instrument naming. Generally, this is a question not only of instrument name and instrument identity, but implicitly also of dataset identity and the :ref:`homogeneity of datasets <EbasDatasetHomogeneity>`. As a general rule, when the measurements are still comparable with the ones done with the old instrument setup, and they show no incontinuities due to the instrument change, the instrument name *can* be (but does not have to be) the same, :term`Instrument manufacturer`, :term`instrument model` and *instrument serial number* can be specified seperately for each reporting period regardless of the instrument name being used (see also :ref:`EbasDatamodelTimeDependentInstrumentMetadata`). This enables the use of the same instrument name with different instrument models or serial numbers. If a period of co-located measurements is performed (with the old and the new instrument operating at the same time), a new instrument name needs to be created, otherwise the measurements could not be distinguished, If the results are expected to be not comparable, a new instrument name must be assigned as well. A new instrument name will always result in the creation of new :term:`datasets<Dataset>`. Example: If the dmps at Zeppelin mountain has been exchanged with a similar instrument and the measurements are comparable, the lab can report the measurements still with instrument name ``dmps_no42``, but report a different :term`Instrument manufacturer`, :term`Instrument model` and :term`Instrument serial number` for the next reporting period. .. _`EbasDatamodelTimeDependentInstrumentMetadata`: Time dependent instrument metadata ********************************** Some attributes of instrument metadata may change over time even if the instrument identity (:term:`Instrument reference`) and the core metadata are the same: * :term:`Instrument manufacturer` * :term:`instrument model` * Instrument serial number See also the :ref:`note on instrument naming <EbasInstrumentNaming>` for details. Time dependent instrument metadata are implemented as :ref:`time dependent, history aware metadata entity <EbasDatamodelTimeDependentMetadata>`. .. _`EbasDatamodelAnaInstMetadata`: Analytical instrument metadata ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. versionadded:: 3.01.00 The analytical instrument metadata are composed of * :ref:`EbasDatamodelAnaInstCoreMetadata` * :ref:`EbasDatamodelTimeDependentAnaInstMetadata` * :ref:`EbasDatamodelTimeDependentAnaInstEmployment` .. _`EbasDatamodelAnaInstCoreMetadata`: Analytical instrument core metadata *********************************** Instrument core metadata are composed of: * :term:`Analytical instrument reference`, a unique identifier for individual instruments. The analytical instrument reference is already syntactically composed of * :term:`Laboratory code` and * :term:`Analytical instrument name` * :term:`Analytical measurement technique` is stored to make sure, the same instrument (same analytical instrument reference) always uses the same analytical measurement technique. .. _`EbasAnaInstNaming`: .. note:: **Analytical instrument naming** The analytical instrument name should be a name used in the lab for refering to an instrument. The data model allows for using the same name even if the physical instrument changes over time (e.g. change of instrument). One analytical instrument is assigned a manufacturer, instrument model and serial number in the :ref:`time dependent analytical instrument metadata <EbasDatamodelTimeDependentAnaInstMetadata>`. Unlike the field instruments (where a new instrument name requitres a new :term:`dataset`), analytical instruments can change over time within one :term:`dataset`. The reason for this is that very often laboratories use several instruments with the same :term:`analytical measurement technique` interchangeably (i.e. samples from one site may be analysed on different instruments), but still the timeseries is considered to be consistent. The relation of :term:`dataset` and laboratory instruments is defined by the :ref:`EbasDatamodelTimeDependentAnaInstEmployment` .. _`EbasDatamodelTimeDependentAnaInstMetadata`: Time dependent analytical instrument metadata ********************************************* Some attributes of the laboratory instrument metadata may change over time even if the instrument identity (:term:`analytical instrument reference`) and the core metadata are the same: * :term:`Analytical instrument manufacturer` * :term:`Analytical instrument model` * Analytical instrument serial number See also the :ref:`note on analytical instrument naming <EbasAnaInstNaming>` for details. Time dependent analytical instrument metadata are implemented as :ref:`time dependent, history aware metadata entity <EbasDatamodelTimeDependentMetadata>`. .. _`EbasDatamodelTimeDependentAnaInstEmployment`: Time dependent analytical instrument employment *********************************************** The relation which laboratory instrument was used for a given time series may change over time even if the dataset is considered to be consistent. The laboratory instrument can be *bound* to a :ref:`dataset` for a given valid time interval. See also the :ref:`note on analytical instrument naming <EbasAnaInstNaming>` for details. Time dependent analytical instrument employment is implemented as :ref:`time dependent, history aware metadata entity <EbasDatamodelTimeDependentMetadata>`. .. _`EbasDatamodelQAMetadata`: QA Metadata ^^^^^^^^^^^ .. versionadded:: 3.01.00 The QA metadata are composed of * reference to a :term:`dataset` * reference to a QA measure (which can be a interlaboratory comparison, on-site or off-site intercomparison or an on-site audit) * data of the QA measure performed * valid time interval (measurement time interval for which the QA is valid) * QA specific data: - general outcome (pass, no pass, not participated) - bias (relative or absolute) - variability (relative or absolute) - documentation about the QA (document name, date, URL) QA metadata are implemented as :ref:`time dependent, history aware metadata entity <EbasDatamodelTimeDependentMetadata>`. .. _`EbasDatamodelSubmission`: Submission ^^^^^^^^^^ The submission entity stores all metadata related to the submitted data file itself. A submission represents a datafile that has been reported to EBAS and ingested into the database. One submission (datafile) can contain one or more variables. Each variable relates to one :ref:`dataset <EbasDatamodelDataset>` in EBAS, but one submission contains only data for one submission interval (usually one year, the dataset usually contains data from multiple submission intervals). * Origin of data: * Organization which produced the data * Data originator and submitter :ref:`roles <EbasDatamodelRoles>` * Revision information (version, description, revision date) * NILU staff who imported the data Submissions are stored as :ref:`static metadata <EbasDatamodelStaticMetadata>`. A submission will never cease to exists, it can only be superseded by a new submission, but even this leaves the original submission as a historic fact. .. _`EbasDatamodelRoles`: Roles ^^^^^ Roles describe the role of persons who contributed in producing the data. There are two types of roles: * :term:`Data originator` * :term:`Data submitter` Roles are related to :ref:`data submissions <EbasDatamodelSubmission>`. There must be at least one data originator and one data submitter for each submission. Roles are stored as :ref:`static metadata <EbasDatamodelStaticMetadata>`. .. _`EbasDatamodelProjectAssociations`: Project associations ^^^^^^^^^^^^^^^^^^^^ Project associations associate a certain time interval of data of a :term:`dataset` to a :term:`framework`. Each dataset can be associated to multiple frameworks, even at the same or overlapping time intervals. But each dataset must be associated to at least one framework for any time interval of it's data (there may not exist any time interval of data without framework association) .. _`EbasHistory`: Historic states of data ======================= EBAS keeps the full history of changes in the database. Any historic state of the database can be reproduced. This enables some additional features which will be described in the following sub-chapters. There are however some restrictions to the history function: * History is supported since the release 3.0 of EBAS. EBAS 3.0 was rolled out in May 2014. Thus the history is available since this date. Older data appear as if inserted 1st May 2014 (2014-05-01T00:00:00). * :ref:`NRT data <EbasNearRealtime>` data are stored without any historic information. All metadata and data are just stored in the latest state. * Some rare database maintenance requires changes that are not visible in the history of the database. This is mainly the case when changing :ref:`master data <EbasDatamodelMasterData>`. Those changes are avoided as much as possible. Operation with historic database state (Time travel) ---------------------------------------------------- All EBAS programs that query data (e.g. :ref:`EBASprogram_ebas_list_ds`, :ref:`EBASprogram_ebas_extract`, all statistics programs and many more) can query the database as if it was any historic date in the past using the :option:`--state` argument. The result of the operation will be the same as if the operation had been performed at the historic point in time specified. This can be thought of as a time travel option (unfortunately we can only travel back in time - sorry, no future observations in this version of EBAS). Differences between two (historic) database states -------------------------------------------------- Another utilization of the EBAS history is the possibility of restricting EBAS programs to just work on data and metadata that *changed between two historic database states*. This can be achieved with the :option:`--diff` argument. Only datasets changed between the the database state and this date will be processed. A special case of this feature is the possibility of *differential data extracts* (see :ref:`ebas_extract - differential extracts <EBAS_differential_extract>`). .. _`EbasNearRealtime`: Near realtime data ================== Near real time (:term:`NRT`) data in EBAS are usually available within two hours after the observation. NRT dataset are specially handled in the database in many respects. The high frequency of changes to each NRT dataset (usually one change per hour) makes it impossible to keep the :ref:`history <EbasHistory>` of changes in the database. With NRT data, only the latest state of the data is stored in the database, if a historic state of the data is accessed, the time series appears as it was at the historic timestamp, but measurement samples up to the current state of the database are reported as missing (not as not existing as it was correct at the historic state). This is a side effect of avoiding the historic changes to be stored in the database. Data that would have been future data in the perspective of the historic state appear as missing. Furthermore, the :term:`project acronyms <project acronym>` associated to NRT datasets will always end with ``_NRT``. This is the way NRT data are marked for data users. Additionally, data policies will generally be different for NRT data in all frameworks. Thus a different :term:`project acronym`, implying a different data policy and different access rights is needed for all projects. :term:`Instrument names <Instrument name>` and :term:`instrument references <Instrument reference>` of NRT data will always end with ``_NRT``. This is necessary in order to make instrument metadata of NRT data completely independent from regular (quality assured data). The submission of quality assured data should in no way change the instrument metadata of stored NRT data of the same (physical) instrument and vice versa. Additionally problematic is the fact, that instrument metadata for NRT metadata should *not* be :ref:`history aware <EbasHistory>`, and need to be handled differently whenever inserted, changed or deleted. Therefor we create an additional "virtual" instrument for NRT data, even though in reality it's the *same physical instrument*. All :ref:`time dependent metadata <EbasDatamodelTimeDependentMetadata>` will only feature *one gapless* interval for NRT data.