******************
 Database Concepts
******************

.. _`EbasDatamodel`:

The EBAS data model
===================

The model presented here is very simplified version of the technical design data
model of EBAS and shows only the basic concepts in a user understandable way.
Basic knowledge of the data model will help the user in the daily work with
metadata and data.

.. _`EBASDatamodelOverviewFigure`:

Overview
--------

.. image:: FIGURES/EBAS-3-user.png

Classes of metadata
-------------------

.. _`EbasDatamodelMasterData`:

Master data
^^^^^^^^^^^

Master data entities are shown in *green* in the
:ref:`overview figure <EBASDatamodelOverviewFigure>` above.

Master data are general metadata that are referenced by other metadata entities.
Master data should not change over time (changes are at least change very seldom
and are not considered as change of the metadata, but as a correction).
:ref:`Historic states <EbasHistory>` of master data are *not* preserved and
historic extracts will always produce the latest state of master data.

A typical examples are Station metadata, which are referenced by datasets. The
station metadata (Station name, position, altitude, ...) are static and need
to be the same for all measurements performed at this station. Changes, e.g. in
the Station position would be corrections (if a station was physically moved, a
new station needs to be created and referenced). The same applies for
organisation metadata.

The vast amount of master data are controlled vocabulary (e.g. Statistics code,
Instrument type, Component name, ...). Those master data entities are *not*
shown in the :ref:`overview figure <EBASDatamodelOverviewFigure>` in order to keep the
figure simple.

.. _`EbasDatamodelStaticMetadata`:

Static metadata
^^^^^^^^^^^^^^^

Static metadata entities are shown in *blue* in the
:ref:`overview figure <EBASDatamodelOverviewFigure>` above.

Some metadata are considered to be immutable. Rather then changing those
metadata, the entities have to be deleted and recreated. Examples for static
metadata are submissions and the dataset core metadata.

.. _`EbasDatamodelHistoryAwareMetadata`:

History aware metadata
^^^^^^^^^^^^^^^^^^^^^^

History aware metadata entities are shown in *red* in the
:ref:`overview figure <EBASDatamodelOverviewFigure>` above.

History aware metadata keep the full history of changes in the database.
See :ref:`EbasHistory` for more information.

.. _`EbasDatamodelTimeDependentMetadata`:

Time dependent, history aware metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Time dependent, history aware metadata entities are also shown in *red* in the
:ref:`overview figure <EBASDatamodelOverviewFigure>` above.

In addition to being :ref:`history aware <EbasDatamodelHistoryAwareMetadata>`, those
metadata are always valid for a specific data interval of the timeseries and can
have different values for different data intervals.

An example could be the *detection limit*: Different detection limits can be
reported in submissions for different years (*time dependent*). Additionally,
the detection limit for one specific year can be changed afterwards and the
historic value before the change will still be available in the database
(*history aware*).

Entities
--------

.. _`EbasDatamodelDataset`:

Dataset
^^^^^^^

The central part of the EBAS data model is the *dataset*. A dataset represents
all metadata and data for one specific measurement variable over time.

.. _`EbasDatasetHomogeneity`:

.. topic:: **Homogeneity of datasets:**

   A dataset is homogeneous in the sense that data from different measurement
   intervals within the whole dataset are comparable and without
   incontinuities caused by changes in instrument configuration or method.

A dataset consists of:

   * :term:`Dataset setkey`, a unique identifier for the dataset.
   * :ref:`EbasDatamodelDatasetCoreMetadata`, which define the identity of the dataset.
   * :ref:`EbasDatasetAdditionalMetadata`, which are mutable.
   * :ref:`EbasDatasetTimeDependentDatasetMetadata`, which are mutable and can have
     different values for different data intervals.
   * References to station, laboratory, field instrument metadata, laboratory
     instrument metadata and QA metadata

     Datasets refer to other metadata entities. Those referred entities are
     uniquely defined (e.g. all station metadata will be the same for all
     datasets referring to the same station).

   * Measurement data (time series)

.. _`EbasDatamodelDatasetCoreMetadata`:

Dataset core metadata
^^^^^^^^^^^^^^^^^^^^^

Dataset core metadata define the identity of a dataset. The dataset core
metadata bind also a :term:`dataset setkey`.  Two datasets with identical core
metadata would be indistinguishable and may not exist in parallel.

As the core metadata identify the dataset, they may never change in the
lifetime of a dataset. Thus the dataset core metadata are implemented as a
:ref:`static metadata entity <EbasDatamodelStaticMetadata>`.

Core metadata are:   
                                                                                 
   * :term:`Station code`, which refers to the *station metadata entity*
   * A reference to the :term:`measurement parameter <Parameter>` by
     the triple: 
                                                                           
      * :term:`Regime code`                                                
      * :term:`Matrix name`                                                
      * :term:`Component name`                                             

   * :term:`Instrument type`, which is controlled vocabulary
   * :term:`Instrument reference` which refers to *instrument metadata*
   * :term:`Method reference`, which refers to *method metadata*
   * :term:`Statistics code`, which is controlled vocabulary
   * :term:`Resolution code`
   * any :ref:`EbasDatamodelDatasetCharacteristics`

.. _`EbasDatamodelDatasetCharacteristics`:

Dataset characteristics
^^^^^^^^^^^^^^^^^^^^^^^

Some :term:`parameters <parameter>` in EBAS need additional metadata to describe
the quality of the variable. This additional metadata are called
characteristics, as the describe special characteristics of a
:term:`parameter`.

Dataset characteristics are part of the :ref:`EbasDatamodelDatasetCoreMetadata`
and may *not* change in the lifetime of a dataset. Thus they are implemented as
:ref:`static metadata <EbasDatamodelStaticMetadata>`.

Examples for characteristics are:

   * Wavelength for nephelometer measurements: The parameter:

      * :term:`Regime code`: ``IMG``
      * :term:`Matrix name`: ``aerosol``
      * :term:`Component name`: ``aerosol_light_scattering_coefficient``

     needs one more metadata element to describe the parameter of
     measurement:

      * Wavelength:
        Nephelometers measure light scattering in different wavelength.
        Some nephelometers measure the scattering at 3 wavelengths. Thus
        they report 3 variables with the same parameter, but with
        different characteristics (e.g. ``Wavelength=450nm``,
        ``Wavelength=525nm`` and ``Wavelength=635nm``)

   * Size bin for dmps measurements: The parameter:

      * :term:`Regime code`: ``IMG``
      * :term:`Matrix name`: ``aerosol``
      * :term:`Component name`: ``particle_numer_size_distribution``

     needs one more metadata element to describe the parameter of
     measurement:

      * Median size (``D``) or
      * Minimum (``Dmin``) and maximum (``Dmax``) size of the size bin:

     DMPS instruments measure the particle concentration in different size
     bins. The number concentration in each size bin is reported as one
     variable. Thus the size bin needs to be specified by the above
     mentioned characteristics.


.. _`EbasDatasetAdditionalMetadata`:

Additional dataset metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Additional dataset metadata can change historically through the lifetime of a
dataset. Changes are considered as corrections or additions (metadata were not
known before).

However, those metadata are not time dependent and need to be constant over the
whole time series (changes in those metadata over time would break the
:ref:`continuity criteria <EbasDatasetHomogeneity>` of a dataset and the
creation of a new dataset is indicated).

Additional dataset metadata are implemented as
:ref:`history aware metadata <EbasDatamodelHistoryAwareMetadata>`.

* External laboratory (performing the analysis)
* :term:`Data level`
* :term:`Standard method`
* Filter medium, coating and/or solution
* Inlet type
* Humidity/temperature control
* The standard conditions the measurements are based on (standard
  temperature, standard pressure)

.. versionadded:: 3.01.00

   following attributes were added:
   
* Absorption cross section
* Sensor type

.. _`EbasDatasetTimeDependentDatasetMetadata`:

Time dependent dataset metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Time dependent dataset metadata are dataset metadata which can have different
values for different time intervals of the time series. Additionally they have
full history support. Thus they are implemented as a
:ref:`time dependent, history aware metadata entity <EbasDatamodelTimeDependentMetadata>`.

* Statement about occurrence of zero or negative values
* Sample preparation
* Balnk correction
* Detection limit
* Uncertainty (relative or absolute)
* Calibration standard ID
* Inlet description (free text; *inlet type* is defined in :ref:`EbasDatasetAdditionalMetadata`)
* Humidity/temperature control description (free text; *Humidity/temperature control* is defined in :ref:`EbasDatasetAdditionalMetadata`)
* Measurement latitude
* Measurement longitude
* Measurement altitude
* Measurement height
* Orig. time res.
* Sample duration
* Comment

.. versionadded:: 3.01.00

      following attributes were added:

* Upper range limit
* Secondary standard ID
* Inlet tube material
* Inlet tube outer diameter
* Inlet tube inner diameter
* Inlet tube length
* Maintenance description
* Zero/span check type
* Zero/span check interval
* Flow rate
* Filter face velocity
* Exposed filter area
* Filter description
* Filter prefiring (prefiring codeword, temperature, time)
* Filter conditioning (yes/no, temp, RH, time)
* Artifact correction 
* Artifact correction description
* Charring correction
* Water vapor correction
* Ozone correction


.. _`EbasDatamodelInstrumentMetadata`:

Instrument metadata
^^^^^^^^^^^^^^^^^^^

The instrument metadata are composed of

   * :ref:`EbasDatamodelInstrumentCoreMetadata`
   * :ref:`EbasDatamodelTimeDependentInstrumentMetadata`

.. _`EbasDatamodelInstrumentCoreMetadata`:

Instrument core metadata
************************

Instrument core metadata are composed of:

   * :term:`Instrument reference`, a unique identifier for individual
     instruments.
     The instrument reference is already syntactically composed of

        * :term:`Laboratory code` and
        * :term:`Instrument name`

   * :term:`Instrument type` is stored to make sure, the same instrument (same
     instrument reference) is always of the same instrument type


.. _`EbasInstrumentNaming`:

.. note:: **Instrument naming**

   Choosing the instrument name is not always straight forward. Especially when
   changing instruments (e.g. using a new instrument model, or the same model  
   with a different serial number), it can be difficult to decide about the
   instrument naming.

   Generally, this is a question not only of instrument name and instrument
   identity, but implicitly also of dataset identity and the 
   :ref:`homogeneity of datasets <EbasDatasetHomogeneity>`.

   As a general rule, when the measurements are still comparable with the ones
   done with the old instrument setup, and they show no incontinuities due to
   the instrument change, the instrument name *can* be (but does not have to be)
   the same, 

   :term`Instrument manufacturer`, :term`instrument model` and
   *instrument serial number* can be specified seperately for each
   reporting period regardless of the instrument name being used (see also
   :ref:`EbasDatamodelTimeDependentInstrumentMetadata`). This enables
   the use of the same instrument name with different instrument models or
   serial numbers.

   If a period of co-located measurements is performed (with the old and the
   new instrument operating at the same time), a new instrument name needs to be
   created, otherwise the measurements could not be distinguished,

   If the results are expected to be not comparable, a new instrument name must
   be assigned as well.
   
   A new instrument name will always result in the creation of new
   :term:`datasets<Dataset>`.

   Example: If the dmps at Zeppelin mountain has been exchanged with a
   similar instrument and the measurements are comparable, the lab can
   report the measurements still with instrument name ``dmps_no42``, but report
   a different :term`Instrument manufacturer`, :term`Instrument model` and
   :term`Instrument serial number` for the next reporting period.

.. _`EbasDatamodelTimeDependentInstrumentMetadata`:

Time dependent instrument metadata
**********************************

Some attributes of instrument metadata may change over time even if the
instrument identity (:term:`Instrument reference`) and the core metadata are
the same:

   * :term:`Instrument manufacturer`
   * :term:`instrument model` 
   * Instrument serial number

See also the :ref:`note on instrument naming <EbasInstrumentNaming>` for details.

Time dependent instrument metadata are implemented as
:ref:`time dependent, history aware metadata entity <EbasDatamodelTimeDependentMetadata>`.



.. _`EbasDatamodelAnaInstMetadata`:

Analytical instrument metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. versionadded:: 3.01.00

The analytical instrument metadata are composed of

   * :ref:`EbasDatamodelAnaInstCoreMetadata`
   * :ref:`EbasDatamodelTimeDependentAnaInstMetadata`
   * :ref:`EbasDatamodelTimeDependentAnaInstEmployment`

.. _`EbasDatamodelAnaInstCoreMetadata`:

Analytical instrument core metadata
***********************************

Instrument core metadata are composed of:

   * :term:`Analytical instrument reference`, a unique identifier for individual
     instruments.
     The analytical instrument reference is already syntactically composed of

        * :term:`Laboratory code` and
        * :term:`Analytical instrument name`

   * :term:`Analytical measurement technique` is stored to make sure, the same
     instrument (same analytical instrument reference) always uses the same
     analytical measurement technique.


.. _`EbasAnaInstNaming`:

.. note:: **Analytical instrument naming**

   The analytical instrument name should be a name used in the lab for refering
   to an instrument. The data model allows for using the same name even if the 
   physical instrument changes over time (e.g. change of instrument).
   One analytical instrument is assigned a manufacturer, instrument model and
   serial number in the
   :ref:`time dependent analytical instrument metadata <EbasDatamodelTimeDependentAnaInstMetadata>`.

   Unlike the field instruments (where a new instrument name requitres a new
   :term:`dataset`), analytical instruments can change over time within one
   :term:`dataset`. The reason for this is that very often laboratories use
   several instruments with the same :term:`analytical measurement technique`
   interchangeably (i.e. samples from one site may be analysed on different
   instruments), but still the timeseries is considered to be consistent.
   The relation of :term:`dataset` and laboratory instruments is defined by the
   :ref:`EbasDatamodelTimeDependentAnaInstEmployment`

.. _`EbasDatamodelTimeDependentAnaInstMetadata`:

Time dependent analytical instrument metadata
*********************************************

Some attributes of the laboratory instrument metadata may change over time even
if the instrument identity (:term:`analytical instrument reference`) and the
core metadata are the same:

   * :term:`Analytical instrument manufacturer`
   * :term:`Analytical instrument model` 
   * Analytical instrument serial number

See also the :ref:`note on analytical instrument naming <EbasAnaInstNaming>` for
details.

Time dependent analytical instrument metadata are implemented as
:ref:`time dependent, history aware metadata entity <EbasDatamodelTimeDependentMetadata>`.

.. _`EbasDatamodelTimeDependentAnaInstEmployment`:

Time dependent analytical instrument employment
***********************************************

The relation which laboratory instrument was used for a given time series 
may change over time even if the dataset is considered to be consistent.

The laboratory instrument can be *bound* to a :ref:`dataset` for a given valid
time interval.

See also the :ref:`note on analytical instrument naming <EbasAnaInstNaming>` for
details.

Time dependent analytical instrument employment is implemented as
:ref:`time dependent, history aware metadata entity <EbasDatamodelTimeDependentMetadata>`.


.. _`EbasDatamodelQAMetadata`:

QA Metadata
^^^^^^^^^^^

.. versionadded:: 3.01.00

The QA metadata are composed of

   * reference to a :term:`dataset`
   * reference to a QA measure (which can be a interlaboratory comparison,
     on-site or off-site intercomparison or an on-site audit)
   * data of the QA measure performed
   * valid time interval (measurement time interval for which the QA is valid)
   * QA specific data:

     - general outcome (pass, no pass, not participated)
     - bias (relative or absolute)
     - variability (relative or absolute)
     - documentation about the QA (document name, date, URL)

QA metadata are implemented as
:ref:`time dependent, history aware metadata entity <EbasDatamodelTimeDependentMetadata>`.

.. _`EbasDatamodelSubmission`:                                       

Submission
^^^^^^^^^^

The submission entity stores all metadata related to the submitted data file
itself.

A submission represents a datafile that has been reported to EBAS and ingested
into the database.
One submission (datafile) can contain one or more variables. Each variable
relates to one :ref:`dataset <EbasDatamodelDataset>` in EBAS, but one submission
contains only data for one submission interval (usually one year, the dataset
usually contains data from multiple submission intervals).

   * Origin of data:
      * Organization which produced the data
      * Data originator and submitter :ref:`roles <EbasDatamodelRoles>`
   * Revision information (version, description, revision date)
   * NILU staff who imported the data

Submissions are stored as :ref:`static metadata <EbasDatamodelStaticMetadata>`.
A submission will never cease to exists, it can only be superseded by a new
submission, but even this leaves the original submission as a historic fact.

.. _`EbasDatamodelRoles`:  

Roles
^^^^^

Roles describe the role of persons who contributed in producing the data.
There are two types of roles:

   * :term:`Data originator`
   * :term:`Data submitter`

Roles are related to :ref:`data submissions <EbasDatamodelSubmission>`.
There must be at least one data originator and one data submitter for each
submission.

Roles are stored as :ref:`static metadata <EbasDatamodelStaticMetadata>`.

.. _`EbasDatamodelProjectAssociations`:  

Project associations
^^^^^^^^^^^^^^^^^^^^

Project associations associate a certain time interval of data of a
:term:`dataset` to a :term:`framework`.

Each dataset can be associated to multiple frameworks, even at the same or
overlapping time intervals. But each dataset must be associated to at least
one framework for any time interval of it's data (there may not exist any time
interval of data without framework association)


.. _`EbasHistory`:

Historic states of data
=======================

EBAS keeps the full history of changes in the database. Any historic state of
the database can be reproduced. This enables some additional features which will
be described in the following sub-chapters.

There are however some restrictions to the history function:

* History is supported since the release 3.0 of EBAS. EBAS 3.0 was rolled out
  in May 2014. Thus the history is available since this date. Older data
  appear as if inserted 1st May 2014 (2014-05-01T00:00:00).

* :ref:`NRT data <EbasNearRealtime>` data are stored without any historic
  information. All metadata and data are just stored in the latest state.

* Some rare database maintenance requires changes that are not visible in the
  history of the database. This is mainly the case when changing
  :ref:`master data <EbasDatamodelMasterData>`.
  Those changes are avoided as much as possible.

Operation with historic database state (Time travel)
----------------------------------------------------

All EBAS programs that query data (e.g.
:ref:`EBASprogram_ebas_list_ds`, :ref:`EBASprogram_ebas_extract`, all
statistics programs and many more) can query the database as if it was any
historic date in the past using the :option:`--state` argument.
The result of the operation will be the same as if the operation had been
performed at the historic point in time specified.  
This can be thought of as a time travel option (unfortunately we can only travel
back in time - sorry, no future observations in this version of EBAS).

Differences between two (historic) database states
--------------------------------------------------

Another utilization of the EBAS history is the possibility of restricting EBAS
programs to just work on data and metadata that *changed between two historic
database states*. This can be achieved with the :option:`--diff` argument.
Only datasets changed between the the database state and this date will be
processed.

A special case of this feature is the possibility of
*differential data extracts* (see
:ref:`ebas_extract - differential extracts <EBAS_differential_extract>`).

.. _`EbasNearRealtime`:

Near realtime data
==================

Near real time (:term:`NRT`) data in EBAS are usually available within two hours
after the observation.

NRT dataset are specially handled in the database in many respects.

The high frequency of changes to each NRT dataset (usually one change per hour)
makes it impossible to keep the :ref:`history <EbasHistory>` of changes in the
database. With NRT data, only the latest state of the data is stored in the
database, if a historic state of the data is accessed, the time series appears
as it was at the historic timestamp, but measurement samples up to the current
state of the database are reported as missing (not as not existing as it was
correct at the historic state). This is a side effect of avoiding the historic
changes to be stored in the database. Data that would have been future data in
the perspective of the historic state appear as missing.

Furthermore, the :term:`project acronyms <project acronym>` associated to NRT
datasets will always end with ``_NRT``. This is the way NRT data are marked
for data users. Additionally, data policies will generally be different for NRT
data in all frameworks. Thus a different :term:`project acronym`, implying a
different data policy and different access rights is needed for all projects.

:term:`Instrument names <Instrument name>` and
:term:`instrument references <Instrument reference>` of NRT data will always end
with ``_NRT``. This is necessary in order to make instrument metadata of NRT
data completely independent from regular (quality assured data).
The submission of quality assured data should in no way change the instrument
metadata of stored NRT data of the same (physical) instrument and vice versa.
Additionally problematic is the fact, that instrument metadata for NRT metadata
should *not* be :ref:`history aware <EbasHistory>`, and need to be handled
differently whenever inserted, changed or deleted. Therefor we create an
additional "virtual" instrument for NRT data, even though in reality it's the
*same physical instrument*.

All :ref:`time dependent metadata <EbasDatamodelTimeDependentMetadata>` will
only feature *one gapless* interval for NRT data.