Data checking or validation is based upon:
Records of old data can be used to create simple statistics including percentiles, mean values and standard deviations. Log-transformed data are often preferred. These statistics can be used in connection with control charts or in other comparisons of new data with aggregation of the old ones.
Relations between various chemical components should be utilized, this includes ion balances, relations between sea salt components, and relations between constituents in minerals and dust from other sources. Comparisons with measurements from neighbour stations can be useful, and plots of time-series, e.g. 4-5 year long series of monthly averages of each component can give indications about measurement problems. Estimates of conductivity should be compared with the measured ones. When pH is higher than 5-6 weak acids, which normally are not measured, will be present in the sample. This is a frequent problem in connection with precipitation samples at many EMEP sites.
In this case the ion balance test and comparisons with conductivity will fail unless the missing anions are measured, i.e. through titrations. It should additionally be noted that the equivalent conductivity of the hydronium ion is much higher that those of the other ions, and that a conductivity test of an acidic sample therefore tends to be a test on the pH determination.
The statistical tests compare new measurements with data already stored in the data base. The tests are carried out to identify possible outliers and results which may be wrong. They can be based upon assumptions about the data distributions i.e. a lognormal distribution, or they can be based on comparisons with cumulative frequency distributions.
Gaseous, aerosol or precipitation components may be compared with all earlier data for each component making use of lognormal distributions. The data should then be split into data from different seasons or into winter and summer data. Data outside three or four times the standard deviations should be inspected manually by comparison with other components, concentrations the preceding and following days, and concentrations at neighbouring stations.
The distributions of the different types of data may deviate from a theoretical lognormal distribution. The deviation may be particularly notable in the low concentration part of the distribution where all concentrations less than the detection limit will have to be set equal to a small value. Since the tests are used only to identify measurements which should be inspected more closely, minor deviations from a theoretical distribution function can be accepted.
One way to test if a set of data in fact follows a theoretical distribution function is to make use of the Kolmogorov-Smirnov one-sample, two-tailed test (Siegel, 1956).
Other useful statistical textbooks are Gilbert (1987) and Conover (1980).
The EMEP precipitation programme includes all main components in precipitation, and the difference of positive and negative ion concentrations expressed in microequivalents per litre should therefore be zero. Alternatively, the ratio between the anion and cation concentrations expressed in microequivalents per litre should be close to one.
The effect of minor components e.g. phosphates and organic acids, which are not included in the analysis, is usually negligible in acid precipitation.
Assuming equilibrium between carbon dioxide in air and carbonic acid in precipitation, the bicarbonate concentration is negligible when the pH is below 5 and will only contribute 5 e/l at pH=6. Bicarbonate ions dissociate into carbonate ions, but this is negligible below pH=8.
When pH is above 6 in a precipitation sample, experience shows that there apparently is present a large excess of anions in the sample which can not be accounted for. This may be the case even if the bicarbonate concentration, calculated from simple equilibrium conditions, are added.
The weak or strong acids were determined by a titration in the start of EMEP, and these results revealed for some sites large differences between the weak acid concentration measured and the bicarbonate concentrations as calculated from pH assuming equilibrium. It is possible that precipitation samples sometimes are supersaturated with carbon dioxide and therefore may contain more bicarbonate than expected. Clearly, if the pH in precipitation samples at a site frequently is above 6, a titration of the acid concentration should be performed on a routine basis in order to be able to control the precipitation data quality.
During storage, soil dust, organic material etc. may be dissolved or biological processes may occur under unfavourable conditions. Deviations in the ionic sum from zero may indicate this.
The ionic balance check should be carried out as soon as possible, while the chemical analysis can still be repeated. The DGO in Section 5.2 has 10–15% laboratory accuracy as target for the main components in precipitation. As a general guideline, based upon the difference and the sum of cation and anion concentrations, the ion concentration difference in per cent of the ion concentration sum should be lower than 10–15% (except for samples with ion sums below 50 µe/l). If a complete chemical analysis is performed, the ionic balance test is equally useful for aerosol samples.
The conductivity of the precipitation samples should be measured, and compared with values calculated from the measured concentrations by adding the equivalent ionic conductivities. The conductivity measurements should be carried out at 25 °C. A correct determination of conductivity will reveal whether the ion concentration sum is too low or too high. When combined with ion balance calculation and other information, e.g. relations between sea salt components at marine influenced sites, it will identify a smaller group of components which are wrong.
It should be noted, however, that at low pH values (pH < 4.0) the conductivity of the solution will be dominated by the hydrogen ions. Errors in the concentrations of other ionic species will then not be easily detected.
Since this test is based on the ion concentrations as is the ionic balance test, the same limitations as above occur for pH > 6.
Explanation of symbols
(A) Concentration
of element A in mg/l. Primary precipitation parameter as reported in data base.
[A] Concentration
of element A in µe/l (micro-equivalents per litre). Used in computation of ionic
sums and conductivity.
EA Equivalent weight for ion species A in g/l.
FA Equivalent ionic conductivity for ion species A in mho/cm = (S = Siemens).
The equivalent conductivity FA expresses the conductivity due to one equivalent of A per litre.
Conversion of concentration
To
convert from (A) to [A]
the following formula is used:
(1)
The equivalent weight EA for different ion species are given in Table 6.1.1 below.
It is seen from this table that parameter 4, H+, is an exception. It is reported in the unit µe/l in the data base. Formula (1) is never applied to this species.
Table 6.1.1: Equivalent weights (EA) and equivalent ionic conductivities (EA) (FA) at infinite solution and 25°C for different species (WMO-GAW Report 85, CRC, 1985–1986).
Species |
EA |
FA |
SO42-S |
16.0 |
80.0 |
SO42-S (corr) |
– |
– |
H+ |
– |
349.7 |
NH4+-N |
14.0 |
73.5 |
NO3--N |
14.0 |
71.4 |
Na+ |
23.0 |
50.1 |
Mg2+ |
12.2 |
53.0 |
Cl- |
35.5 |
76.3 |
Ca2+ |
20.0 |
59.5 |
pH |
– |
349.7 |
K+ |
39.1 |
73.5 |
HCO3- |
– |
44.5 |
Sum of positive ions
The
formula is:
ISP = [H+] + [NH4+-N] + [Na+] + [Mg2+] + [Ca2+] + [K+] (2)
If H+ is measured by titration and is negative, it is set to zero in this computation (refer to the section on weak acids below).
If H+ is not measured, but pH is determined with a legal value (pH > 0), the [H+] is substituted by:
[H+comp.] = 10(6.0-pH) (3)
The remaining elements in formula (2) are computed by formula (1) if the species are reported, and otherwise set to zero.
Weak acids
If [H+] determined
through a titration is negative, it no longer reflects the concentration of
strong acids in the precipitation. Instead it now reflects the sum of
concentrations of various weak acids, including the bicarbonate ion, HCO3-. When this condition is found, the
following two steps are taken before ionic sums are computed:
[Weak acids] = -[H+] (4)
[H+] = 0 (5)
Sum of negative ions
The basic formula is:
ISN = [Weak acids] or [HCO3-] + [SO42--S] + [NO3--N] + [Cl-] (6)
In this expression [Weak acids] is defined by formula (4) above.
If [weak acids] is not measured, the [HCO3-] is taken into the calculation if pH > 5.0.
[H+comp.] = 10(6.0-pH) µe/l (3)
[HCO3-] = µe/l (Topol et al.,
1985) (7)
The remaining elements in (6) are computed by formula (1) if the corresponding species are reported.
Conductivity
The basic formula for the conductivity is:
(10)
The expression [A] is as before computed by formula (1).
PCs and Unix systems have made graphical possibilities easy accessible which should be utilized in the data control. Although errors should be detected at an much earlier stage, plots of monthly average concentrations in three or four year long series, have revealed errors in EMEP data. This is strongly recommended as an additional test. Plots of daily concentrations or precipitation amounts should likewise be a part of the routine. The plots should be compared with historical data divided into half-yearly, seasonal or even monthly aggregates. From the historical data good sets of 5- and 95-percentiles can be calculated since EMEP now possesses a vast amount of data. Data outside these limits should be inspected more closely as a routine.
Relations between components which are connected, e.g. sea salt components, should be utilized.
No data should be rejected automatically by use of a computer programmes alone; manual inspection should always be carried out before this step is taken.
The purpose of the EMEP is to provide information about air pollution from distant anthropogenic sources, natural pollution and sources within the region (as far as this is in consistence with the criteria given for site location).
Data carrying other types of information, e.g. contaminated samples or careless handling of samples etc., should only be accepted in the data base when the effect of the contamination is considered to be negligible. These data need to be flagged.
The QA plan for EMEP (EMEP/CCC Report 1/88) and the draft version of this Manual contained a classification of precipitation sample results based on ion balance tests and comparisons between measured and estimated conductivities. Having introduced Data Quality Objectives in EMEP, it seems reasonable to base a classification on the criteria given in Section 5.2. The classification given in the two previous reports should therefore not be used, and a new classification will be worked out for the next revision of this Manual.
Several flags have in the past been used to give information about the quality of the data stored in the data base. These flags are revised and are currently under evaluation. The new data flag system contain the old flags, and it will be extended at need.
Some EMEP sites are located at the coast and are from time to time highly exposed to sea salt particles. This will of course affect several components in precipitation which should be flagged in the data base. In particular the “excess sulphate” in precipitation, which will be the difference between two large numbers, may have a high uncertainty and should be flagged.
The person responsible for the data reporting in each participating country is the data originator (DO). The DO will have access to NILU’s external computer and will take care of the future data transfer to the central data base at the CCC.
All flags are grouped in two categories: V (valid measurement) or I (invalid measurement).
Flags above 250 indicate an exception that has invalidated or reduced the quality of the data element. Flags below 250 indicate that the element is valid, even if it may fail simple validation tests. The value may for example be extreme, but has been tested and found correct.
The flag 100 has in the past been used to indicate that a value is valid even if an exception in the 999-250 range has also been flagged. In this case the 100 flag appear before the other flags. In all other cases, the most severe flag should appear first if more than one flag is needed.
When a measurement is missing and no particular information is available, we cannot assign any numerical value to the measurement (no substitution value is applicable). The measurement value must have been replaced with the transfer file missing flag. For all flags in this group, the measurement is irrecoverably lost, and no substitution value may be computed or estimated. The DO assigns one of the following flags in the flag variable (in addition to setting the transfer file missing flag):
Flag |
Mnemonic |
V/I |
Description |
999 |
MMU |
I |
Missing measurement, unspecified reason |
990 |
MSN |
I |
Precipitation not measured due to snow-fall. Needed for historic data, should not be needed for new data |
980 |
MZS |
I |
Missing due to calibration or zero/span check |
In some cases a measurement may not be performed because the parameter to be measured is not defined. As mentioned above, the concentration of pollutants in precipitation is undefined when there is zero precipitation. In this situation the measurement is not missing, and the data availability is not reduced. It is not possible to compute or estimate a substitution value for a measurement that is undefined. The DO assigns one of the following flags:
Flag |
Mnemonic |
V/I |
Description |
899 |
UUS |
I |
Measurement undefined, unspecified reason |
890 |
UNP |
I |
Concentration in precipitation undefined, no precipitation |
This group of flags is assigned by the DO when the exact numerical value is unknown, but significant additional information is available. This situation exists when a measurement is below the detection limit of the instrument or method, or is considered to be less accurate than normal.
For many data users it is important to know that the value is low, even if a numerical value is not available. Some users may also need to use or create a substitution value. The substitution value may be based on the detection limit (if reported), or on some other estimate. Statisticians have described methods for using the distribution function of all reported values to estimate the average of the values that fall below the detection limit.
Flag |
Mnemonic |
V/I |
Description |
799 |
MUE |
I |
Measurement missing (unspecified reason), data element contains estimated value |
784 |
LPE |
I |
Low precipitation, concentration estimated |
783 |
LPU |
I |
Low precipitation, concentration unknown |
781 |
BDL |
V |
Value below detection limit, data element contains detection limit |
780 |
BDE |
V |
Value below detection limit, data element contains estimated value. |
771 |
ARL |
V |
Value above range, data element contains upper range limit |
770 |
ARE |
V |
Value above range, data element contains estimated value |
750 |
ALK |
I |
H+ not measured in alkaline sample |
701 |
LAU |
I |
Less accurate than usual, unspecified reason. (Used only with old data, for new data see groups 6 and 5) |
This group of flags is assigned by the DO when a measurement value is less accurate than normal due to severe weather or instrument malfunction. The measured value is reported, but should be excluded from use when strict quality control is required.
Flag |
Mnemonic |
V/I |
Description |
699 |
LMU |
I |
Mechanical problem, unspecified reason |
679 |
LUM |
V |
Unspecified meteorological condition |
678 |
LHU |
V |
Hurricane |
677 |
LAI |
I |
Icing or hoar frost in the intake |
659 |
LSA |
I |
Unspecified sampling anomaly |
658 |
LSV |
I |
Too small air volume |
657 |
LPO |
V |
Precipitation collector overflow. Heavy rain shower (squall) |
656 |
LWB |
V |
Wet-only collector failure, operated as bulk collector |
655 |
LMI |
V |
Two samples mixed due to late servicing of sampler. Estimated value created by averaging |
654 |
LLS |
V |
Sampling period longer than normal, observed values reported |
653 |
LSH |
V |
Sampling period shorter than normal, observed values reported |
649 |
LTP |
V |
Temporary power fail has affected sampler operation |
This group of flags is assigned by the DO when a measurement value is less accurate than normal due to some kind of chemical contamination of the sample. The measured value is reported, but should be excluded from use when strict quality control is required.
Flag |
Mnemonic |
V/I |
Description |
599 |
LUC |
I |
Unspecified contamination or local influence |
593 |
LNC |
I |
Industrial contamination |
591 |
LAC |
I |
Agricultural contamination |
578 |
LSS |
I |
Large sea salt contribution (ratio between marine and excess sulphate is larger than 2.0). Used for old data only. For newer data use 451/450. |
568 |
LSC |
I |
Calcium invalid due to sand contamination |
567 |
LIC |
I |
pH, NH4 and K invalid due to insect contamination |
566 |
LBC |
I |
pH, NH4 and K invalid due to bird droppings |
565 |
LPC |
I |
K invalid due to pollen and/or leaf contamination |
558 |
SCV |
V |
Sand contamination, but considered valid |
557 |
LIV |
V |
Insect contamination, but considered valid |
556 |
LBV |
V |
Bird droppings, but considered valid |
555 |
LPV |
V |
Pollen and/or leaf contamination, but considered valid |
549 |
LCH |
I |
Impure chemicals |
540 |
LSI |
I |
Spectral interference in laboratory analysis |
532 |
LHB |
V |
Data less accurate than normal due to high field blank value |
531 |
LLR |
V |
Low recovery, analysis inaccurate |
521 |
LBA |
V |
Bactericide was added to sample for storage under warm climate. Considered valid |
This group of flags is assigned by the DO after evaluation of the credibility of the measured values. If a measured value is extremely high or low, it may in many cases be suspected to be wrong based on statistics alone. In a conservative presentation of the data set such elements should be excluded.
Some measurements are found to be inconsistent with other measurements or with computed parameters (ion balance, conductivity, etc.). As above, such measurements may be used with caution, but should be excluded from use when strict quality control is required.
Flag |
Mnemonic |
V/I |
Description |
499 |
INU |
V |
Inconsistent with another unspecified measurement |
478 |
IBA |
I |
Invalid due to inconsistency discovered through ion balance calculations |
477 |
ICO |
I |
Invalid due to inconsistency between measured and estimated conductivity |
476 |
IBV |
V |
Inconsistency discovered through ion balance calculations, but considered valid |
475 |
COV |
V |
Inconsistency between measured and estimated conductivity, but considered valid |
460 |
ISC |
I |
Contamination suspected |
459 |
EUE |
I |
Extreme value, unspecified error |
458 |
EXH |
V |
Extremely high value, outside four times standard deviation in a lognormal distribution |
457 |
EXL |
V |
Extremely low value, outside four times standard deviation in a lognormal distribution |
456 |
IDO |
I |
Invalidated by data originator |
451 |
SSI |
I |
Invalid due to large sea salt contribution |
450 |
SSV |
V |
Considerable sea salt contribution, but considered valid |
This group of flags (flags 301-399) is presently not defined.
This group of flags is reserved for use by the database co-ordinator. The flags in this group are identical to group 4 above. They are only assigned by the database co-ordinator if an inconsistency is found, and the data originator has not previously flagged the condition.
Flag |
Mnemonic |
V/I |
Description |
299 |
CNU |
V |
Inconsistent with another unspecified measurement |
278 |
CBA |
I |
Invalid due to inconsistency discovered through ion balance calculations |
277 |
CCO |
I |
Invalid due to inconsistency between measured and estimated conductivity |
276 |
CIV |
V |
Inconsistency discovered through ion balance calculations, but considered valid |
275 |
CCV |
V |
Inconsistency between measured and estimated conductivity, but considered valid |
260 |
CSC |
I |
Contamination suspected |
259 |
CUE |
I |
Unspecified error expected |
258 |
CXH |
V |
Extremely high value, outside four times standard deviation in a log-normal distribution |
257 |
CXL |
V |
Extremely low value, outside four times standard deviation in a log-normal distribution |
251 |
CSI |
I |
Invalid due to large sea salt contribution |
250 |
CSV |
V |
Considerable sea salt contribution, but considered valid |
249 |
QDT |
V |
Apparent typing error corrected. Valid measurement |
211 |
QDI |
V |
Irregular data checked and accepted by database co-ordinator. Valid measurement |
210 |
QDE |
V |
Episode data checked and accepted by database co-ordinator. Valid measurement |
Flag |
Mnemonic |
V/I |
Description |
147 |
QOD |
V |
Below theoretical detection limit or formal Q/A limit, but a value has been measured and reported and is considered valid |
120 |
QOR |
V |
Sample reanalysed with similar results. Valid measurement |
111 |
QOI |
V |
Irregular data checked and accepted by data originator. Valid measurement |
110 |
QOE |
V |
Episode data checked and accepted by data originator. Valid measurement |
100 |
QOU |
V |
Checked by data originator. Valid measurement |
This group of flags (flags 001-099) is presently not defined. The “flag” value 0 is not an error condition flag. It must be assigned to the flag variable for all measurements that are of normal quality. In this manner the DO confirm that the data element is valid (with no known exception that should have been flagged).
A new relational data base contains the concentrations or measurements with remarks/flags to the data, and the information about sites, instruments etc.
Data reporting forms have been worked out by the CCC in the past; three forms may be used for the reporting of concentrations, one for air, one for precipitation, one for air and precipitation components. Forms containing information about the sites were worked out in the past, and new less comprehensive will be worked out and distributed in 1996 together with information about the data reporting formats above.
The data reporting to be introduced in 1995 follows the NASA/AMES type 1001. Besides this format the ISO 7168 is still valid. Magnetic tapes should not be used. For users of NASA/AMES and ISO 7168 a data base will be created at NILU’s external computer, and the users may transfer data directly into this data base using internet.
Data should be submitted to the CCC once a year, the deadline is October 1st. Data which are not received by the deadline may be excluded from the annual data reports from the CCC, due to the time-consuming calculations and long production time.
The procedure for submission of data are found in more detailed on CCC’s homepage: http://www.nilu.no/projects/ccc/submission.html
Data are available from the CCC homepage http://www.nilu.no/projects/ccc/emepdata.html. Besides this annual and seasonal summaries are worked out and printed in reports.
Experience shows that errors are discovered even in the final data. When errors are discovered they are corrected as far as possible. The most correct data will therefore at any time be the data in the data base at the CCC. New copies of this data should always be requested from the CCC for scientific use.
Chemical Rubber Co. (1985) Handbook of chemistry and physics. 66th Edition, 1985–1986. Boca Raton, CRC Press.
Conover, W.J. (1980) Practical nonparametric statistics. New York, Wiley.
Gilbert, R.O. (1987) Statistical methods for environmental pollution monitoring. New York, Van Nostrand Reinhold.
Krognes, T., Gunstrřm, T.Ř. and Schaug, J. (1995) Air quality databases at NILU. EBAS version 1.01. Kjeller (NILU TR 3/95).
Schaug, J. (1988) Quality assurance plan for EMEP. Lillestrřm, Norwegian Institute for Air Research (EMEP/CCC-Report 1/88).
Siegel, S. (1956) Nonparametric statistics for the behavioral sciences. New York, McCraw-Hill.
Sverdrup, H.U., Johnson, M.W. and Fleming, R.H. (1942) The oceans, their physics, chemistry, and general biology. New York, Pretice-Hall.
Topol, L.E., Lev-On, M., Flanagan, J., Schwall, R.J., Jackson, A.E. and Mitchell, W.J. (1985) Quality assurance manual for precipitation measurement systems. Research Triangle Park, NC., U.S. Environmental Protection Agency.
WMO (undated) Chemical analysis of precipitation for GAW: Laboratory analytical methods and sample collection standards. Geneva (WMO/GAW No. 85). (WMO/TD-550).