4. Database Concepts¶
The EBAS data model¶
The model presented here is very simplified version of the technical design data model of EBAS and shows only the basic concepts in a user understandable way. Basic knowledge of the data model will help the user in the daily work with metadata and data.
Overview¶
![../../_images/EBAS-3-user.png](../../_images/EBAS-3-user.png)
Classes of metadata¶
Master data¶
Master data entities are shown in green in the overview figure above.
Master data are general metadata that are referenced by other metadata entities. Master data should not change over time (changes are at least change very seldom and are not considered as change of the metadata, but as a correction). Historic states of master data are not preserved and historic extracts will always produce the latest state of master data.
A typical examples are Station metadata, which are referenced by datasets. The station metadata (Station name, position, altitude, …) are static and need to be the same for all measurements performed at this station. Changes, e.g. in the Station position would be corrections (if a station was physically moved, a new station needs to be created and referenced). The same applies for organisation metadata.
The vast amount of master data are controlled vocabulary (e.g. Statistics code, Instrument type, Component name, …). Those master data entities are not shown in the overview figure in order to keep the figure simple.
Static metadata¶
Static metadata entities are shown in blue in the overview figure above.
Some metadata are considered to be immutable. Rather then changing those metadata, the entities have to be deleted and recreated. Examples for static metadata are submissions and the dataset core metadata.
History aware metadata¶
History aware metadata entities are shown in red in the overview figure above.
History aware metadata keep the full history of changes in the database. See Historic states of data for more information.
Time dependent, history aware metadata¶
Time dependent, history aware metadata entities are also shown in red in the overview figure above.
In addition to being history aware, those metadata are always valid for a specific data interval of the timeseries and can have different values for different data intervals.
An example could be the detection limit: Different detection limits can be reported in submissions for different years (time dependent). Additionally, the detection limit for one specific year can be changed afterwards and the historic value before the change will still be available in the database (history aware).
Entities¶
Dataset¶
The central part of the EBAS data model is the dataset. A dataset represents all metadata and data for one specific measurement variable over time.
Homogeneity of datasets:
A dataset is homogeneous in the sense that data from different measurement intervals within the whole dataset are comparable and without incontinuities caused by changes in instrument configuration or method.
A dataset consists of:
Dataset setkey, a unique identifier for the dataset.
Dataset core metadata, which define the identity of the dataset.
Additional dataset metadata, which are mutable.
Time dependent dataset metadata, which are mutable and can have different values for different data intervals.
References to station, laboratory, field instrument metadata, laboratory instrument metadata and QA metadata
Datasets refer to other metadata entities. Those referred entities are uniquely defined (e.g. all station metadata will be the same for all datasets referring to the same station).
Measurement data (time series)
Dataset core metadata¶
Dataset core metadata define the identity of a dataset. The dataset core metadata bind also a dataset setkey. Two datasets with identical core metadata would be indistinguishable and may not exist in parallel.
As the core metadata identify the dataset, they may never change in the lifetime of a dataset. Thus the dataset core metadata are implemented as a static metadata entity.
Core metadata are:
Station code, which refers to the station metadata entity
A reference to the measurement parameter by the triple:
Instrument type, which is controlled vocabulary
Instrument reference which refers to instrument metadata
Method reference, which refers to method metadata
Statistics code, which is controlled vocabulary
Dataset characteristics¶
Some parameters in EBAS need additional metadata to describe the quality of the variable. This additional metadata are called characteristics, as the describe special characteristics of a parameter.
Dataset characteristics are part of the Dataset core metadata and may not change in the lifetime of a dataset. Thus they are implemented as static metadata.
Examples for characteristics are:
Wavelength for nephelometer measurements: The parameter:
- Regime code:
IMG
- Matrix name:
aerosol
- Component name:
aerosol_light_scattering_coefficient
needs one more metadata element to describe the parameter of measurement:
- Wavelength: Nephelometers measure light scattering in different wavelength. Some nephelometers measure the scattering at 3 wavelengths. Thus they report 3 variables with the same parameter, but with different characteristics (e.g.
Wavelength=450nm
,Wavelength=525nm
andWavelength=635nm
)Size bin for dmps measurements: The parameter:
- Regime code:
IMG
- Matrix name:
aerosol
- Component name:
particle_numer_size_distribution
needs one more metadata element to describe the parameter of measurement:
- Median size (
D
) or- Minimum (
Dmin
) and maximum (Dmax
) size of the size bin:DMPS instruments measure the particle concentration in different size bins. The number concentration in each size bin is reported as one variable. Thus the size bin needs to be specified by the above mentioned characteristics.
Additional dataset metadata¶
Additional dataset metadata can change historically through the lifetime of a dataset. Changes are considered as corrections or additions (metadata were not known before).
However, those metadata are not time dependent and need to be constant over the whole time series (changes in those metadata over time would break the continuity criteria of a dataset and the creation of a new dataset is indicated).
Additional dataset metadata are implemented as history aware metadata.
- External laboratory (performing the analysis)
- Data level
- Standard method
- Filter medium, coating and/or solution
- Inlet type
- Humidity/temperature control
- The standard conditions the measurements are based on (standard temperature, standard pressure)
New in version 3.01.00: following attributes were added:
- Absorption cross section
- Sensor type
Time dependent dataset metadata¶
Time dependent dataset metadata are dataset metadata which can have different values for different time intervals of the time series. Additionally they have full history support. Thus they are implemented as a time dependent, history aware metadata entity.
- Statement about occurrence of zero or negative values
- Sample preparation
- Balnk correction
- Detection limit
- Uncertainty (relative or absolute)
- Calibration standard ID
- Inlet description (free text; inlet type is defined in Additional dataset metadata)
- Humidity/temperature control description (free text; Humidity/temperature control is defined in Additional dataset metadata)
- Measurement latitude
- Measurement longitude
- Measurement altitude
- Measurement height
- Orig. time res.
- Sample duration
- Comment
New in version 3.01.00: following attributes were added:
- Upper range limit
- Secondary standard ID
- Inlet tube material
- Inlet tube outer diameter
- Inlet tube inner diameter
- Inlet tube length
- Maintenance description
- Zero/span check type
- Zero/span check interval
- Flow rate
- Filter face velocity
- Exposed filter area
- Filter description
- Filter prefiring (prefiring codeword, temperature, time)
- Filter conditioning (yes/no, temp, RH, time)
- Artifact correction
- Artifact correction description
- Charring correction
- Water vapor correction
- Ozone correction
Instrument metadata¶
The instrument metadata are composed of
Instrument core metadata¶
Instrument core metadata are composed of:
Instrument reference, a unique identifier for individual instruments. The instrument reference is already syntactically composed of
Instrument type is stored to make sure, the same instrument (same instrument reference) is always of the same instrument type
Note
Instrument naming
Choosing the instrument name is not always straight forward. Especially when changing instruments (e.g. using a new instrument model, or the same model with a different serial number), it can be difficult to decide about the instrument naming.
Generally, this is a question not only of instrument name and instrument identity, but implicitly also of dataset identity and the homogeneity of datasets.
As a general rule, when the measurements are still comparable with the ones done with the old instrument setup, and they show no incontinuities due to the instrument change, the instrument name can be (but does not have to be) the same,
:term`Instrument manufacturer`, :term`instrument model` and instrument serial number can be specified seperately for each reporting period regardless of the instrument name being used (see also Time dependent instrument metadata). This enables the use of the same instrument name with different instrument models or serial numbers.
If a period of co-located measurements is performed (with the old and the new instrument operating at the same time), a new instrument name needs to be created, otherwise the measurements could not be distinguished,
If the results are expected to be not comparable, a new instrument name must be assigned as well.
A new instrument name will always result in the creation of new datasets.
Example: If the dmps at Zeppelin mountain has been exchanged with a
similar instrument and the measurements are comparable, the lab can
report the measurements still with instrument name dmps_no42
, but report
a different :term`Instrument manufacturer`, :term`Instrument model` and
:term`Instrument serial number` for the next reporting period.
Time dependent instrument metadata¶
Some attributes of instrument metadata may change over time even if the instrument identity (Instrument reference) and the core metadata are the same:
- Instrument manufacturer
- instrument model
- Instrument serial number
See also the note on instrument naming for details.
Time dependent instrument metadata are implemented as time dependent, history aware metadata entity.
Analytical instrument metadata¶
New in version 3.01.00.
The analytical instrument metadata are composed of
Analytical instrument core metadata¶
Instrument core metadata are composed of:
Analytical instrument reference, a unique identifier for individual instruments. The analytical instrument reference is already syntactically composed of
Analytical measurement technique is stored to make sure, the same instrument (same analytical instrument reference) always uses the same analytical measurement technique.
Note
Analytical instrument naming
The analytical instrument name should be a name used in the lab for refering to an instrument. The data model allows for using the same name even if the physical instrument changes over time (e.g. change of instrument). One analytical instrument is assigned a manufacturer, instrument model and serial number in the time dependent analytical instrument metadata.
Unlike the field instruments (where a new instrument name requitres a new dataset), analytical instruments can change over time within one dataset. The reason for this is that very often laboratories use several instruments with the same analytical measurement technique interchangeably (i.e. samples from one site may be analysed on different instruments), but still the timeseries is considered to be consistent. The relation of dataset and laboratory instruments is defined by the Time dependent analytical instrument employment
Time dependent analytical instrument metadata¶
Some attributes of the laboratory instrument metadata may change over time even if the instrument identity (analytical instrument reference) and the core metadata are the same:
- Analytical instrument manufacturer
- Analytical instrument model
- Analytical instrument serial number
See also the note on analytical instrument naming for details.
Time dependent analytical instrument metadata are implemented as time dependent, history aware metadata entity.
Time dependent analytical instrument employment¶
The relation which laboratory instrument was used for a given time series may change over time even if the dataset is considered to be consistent.
The laboratory instrument can be bound to a dataset for a given valid time interval.
See also the note on analytical instrument naming for details.
Time dependent analytical instrument employment is implemented as time dependent, history aware metadata entity.
QA Metadata¶
New in version 3.01.00.
The QA metadata are composed of
- reference to a dataset
- reference to a QA measure (which can be a interlaboratory comparison, on-site or off-site intercomparison or an on-site audit)
- data of the QA measure performed
- valid time interval (measurement time interval for which the QA is valid)
- QA specific data:
- general outcome (pass, no pass, not participated)
- bias (relative or absolute)
- variability (relative or absolute)
- documentation about the QA (document name, date, URL)
QA metadata are implemented as time dependent, history aware metadata entity.
Submission¶
The submission entity stores all metadata related to the submitted data file itself.
A submission represents a datafile that has been reported to EBAS and ingested into the database. One submission (datafile) can contain one or more variables. Each variable relates to one dataset in EBAS, but one submission contains only data for one submission interval (usually one year, the dataset usually contains data from multiple submission intervals).
- Origin of data:
- Organization which produced the data
- Data originator and submitter roles
- Revision information (version, description, revision date)
- NILU staff who imported the data
Submissions are stored as static metadata. A submission will never cease to exists, it can only be superseded by a new submission, but even this leaves the original submission as a historic fact.
Roles¶
Roles describe the role of persons who contributed in producing the data. There are two types of roles:
Roles are related to data submissions. There must be at least one data originator and one data submitter for each submission.
Roles are stored as static metadata.
Project associations¶
Project associations associate a certain time interval of data of a dataset to a framework.
Each dataset can be associated to multiple frameworks, even at the same or overlapping time intervals. But each dataset must be associated to at least one framework for any time interval of it’s data (there may not exist any time interval of data without framework association)
Historic states of data¶
EBAS keeps the full history of changes in the database. Any historic state of the database can be reproduced. This enables some additional features which will be described in the following sub-chapters.
There are however some restrictions to the history function:
- History is supported since the release 3.0 of EBAS. EBAS 3.0 was rolled out in May 2014. Thus the history is available since this date. Older data appear as if inserted 1st May 2014 (2014-05-01T00:00:00).
- NRT data data are stored without any historic information. All metadata and data are just stored in the latest state.
- Some rare database maintenance requires changes that are not visible in the history of the database. This is mainly the case when changing master data. Those changes are avoided as much as possible.
Operation with historic database state (Time travel)¶
All EBAS programs that query data (e.g.
ebas_list_ds, ebas_extract, all
statistics programs and many more) can query the database as if it was any
historic date in the past using the --state
argument.
The result of the operation will be the same as if the operation had been
performed at the historic point in time specified.
This can be thought of as a time travel option (unfortunately we can only travel
back in time - sorry, no future observations in this version of EBAS).
Differences between two (historic) database states¶
Another utilization of the EBAS history is the possibility of restricting EBAS
programs to just work on data and metadata that changed between two historic
database states. This can be achieved with the --diff
argument.
Only datasets changed between the the database state and this date will be
processed.
A special case of this feature is the possibility of differential data extracts (see ebas_extract - differential extracts).
Near realtime data¶
Near real time (NRT) data in EBAS are usually available within two hours after the observation.
NRT dataset are specially handled in the database in many respects.
The high frequency of changes to each NRT dataset (usually one change per hour) makes it impossible to keep the history of changes in the database. With NRT data, only the latest state of the data is stored in the database, if a historic state of the data is accessed, the time series appears as it was at the historic timestamp, but measurement samples up to the current state of the database are reported as missing (not as not existing as it was correct at the historic state). This is a side effect of avoiding the historic changes to be stored in the database. Data that would have been future data in the perspective of the historic state appear as missing.
Furthermore, the project acronyms associated to NRT
datasets will always end with _NRT
. This is the way NRT data are marked
for data users. Additionally, data policies will generally be different for NRT
data in all frameworks. Thus a different project acronym, implying a
different data policy and different access rights is needed for all projects.
Instrument names and
instrument references of NRT data will always end
with _NRT
. This is necessary in order to make instrument metadata of NRT
data completely independent from regular (quality assured data).
The submission of quality assured data should in no way change the instrument
metadata of stored NRT data of the same (physical) instrument and vice versa.
Additionally problematic is the fact, that instrument metadata for NRT metadata
should not be history aware, and need to be handled
differently whenever inserted, changed or deleted. Therefor we create an
additional “virtual” instrument for NRT data, even though in reality it’s the
same physical instrument.
All time dependent metadata will only feature one gapless interval for NRT data.