ebas_extract

The program ebas_extract extracts data from the EBAS database. Currently those file formats are supported:

  • EBAS NASA Ames
  • a simple CSV format (mostly for testing)
  • XLM (used for machine to machine communication with the ACTRIS webservice)

Synopsis

ebas_extract.py [-h] [--version] [--cfgfile CFGFILE]
                [--loglevconsole LOGLEVCONSOLE]
                [--loglevfile LOGLEVFILE] [--logfile LOGFILE]
                [--profile] [--nodb] [--dbHost DBHOST] [--db DB]
                [--dbUser DBUSER] [--dbPasswd DBPASSWD] [--do_id DO_ID]
                [--setkey SETKEY] [--station STATION]
                [--project PROJECT] [--instrument INSTRUMENT]
                [--component COMPONENT] [--matrix MATRIX]
                [--group GROUP] [--fi_ref FI_REF] [--me_ref ME_REF]
                [--resolution RESOLUTION] [--statistics STATISTICS]
                [--time TIME] [--state STATE] [--us_id US_ID]
                [--format FORMAT] [--multicolumn] [--expand]
                [--precip_amount] [--xmlwrap | --createfiles]
                [--destdir DESTDIR]
                [--flags {all,compress,none,one-or-all}]
                [--metadata_options METADATA_OPTIONS]
                [--fileindex FILEINDEX]

Commandline arguments

For information about the general concepts for commandline arguments, please refer to Commandline arguments.

General arguments

-h, --help

Show help text and exit.

-v, --version

Display version information and exit.

Configuration arguments

--cfgfile=CFGFILE

Set the EBAS configuration file to be used. See Configuration file for detailed information.

This argument requires an argument value CFGFILE (the configuration file to be used). CFGFILE is the full path and file name, the file path might be absolute or relative. The current user’s home directory may be specified with a tilde character (~), other user’s home directory may be specified as ~other_user.

Note

There may not be any blank characters in the CFGFILE argument value! If the filename contains blank characters, wrap it into double quotes ("). Example:

--cfgfile="my ebas config.cfg"

Examples:

Specify a config file named ebas.cfg in the current working directory (relative path):

--cfgfile=ebas.cfg

Specify a config file named ebas.cfg in the directory config under the current working directory (relative path):

--cfgfile=config/ebas.cfg

Specify a config file named ebas.cfg in the directory /home/me (absolute path):

--cfgfile=/home/me/ebas.cfg

Specify a config file named ebas.cfg in the current user’s home directory:

--cfgfile=~/ebas.cfg

Specify a config file named ebas.cfg in user collegue’s home directory:

--cfgfile=~collegue/ebas.cfg

Default: ~/ebas.cfg

Logging arguments

--loglevconsole=LOGLEVEL

Set log level for console output.

This argument requires an argument value LOGLEVEL. All messages from EBAS are categorized with logging severities (e.g. error messages are written with a different severity as information messages). Severities are: CRITICAL, ERROR, WARNING, INFO, DEBUG (descending severity)

Setting the LOGLEVEL controls which category of messages will be displayed to the user. Possible values for LOGLEVEL are:

  • silent - no messages will be displayed in the console output
  • critical - only CRITICAL messages will be displayed in the console output
  • errors - only messages with severity CRITICAL or ERROR will be displayed in the console output
  • warnings - only messages with severity CRITICAL, ERROR or WARNING will be displayed in the console output
  • info - messages with severity CRITICAL, ERROR, WARNING or INFO will be displayed in the console output
  • debug - all messages will be displayed in the console output

Default: info

--loglevfile=LOGLEVEL

Set log level for logfile output.

This argument requires an argument value LOGLEVEL. All messages from EBAS are categorized with logging severities (e.g. error messages are written with a different severity as information messages). Severities are: CRITICAL, ERROR, WARNING, INFO, DEBUG (descending severity)

Setting the LOGLEVEL controls which category of messages will be included in the logfile. Possible values for LOGLEVEL are:

  • silent - no messages will be written to the logfile and no logfile will be created
  • critical - only CRITICAL messages will be written to the logfile
  • errors - only messages with severity CRITICAL or ERROR will be written to the logfile
  • warnings - only messages with severity CRITICAL, ERROR or WARNING will be written to the logfile
  • info - messages with severity CRITICAL, ERROR, WARNING or INFO will be written to the logfile
  • debug - all messages will be written to the logfile

Default: silent

--logfile=LOGFILE

Set file name for logfile output (including path).

This argument requires an argument value LOGFILE. LOGFILE is the full path and file name, the file path might be absolute or relative. The current user’s home directory may be specified with a tilde character (~), other user’s home directory may be specified as ~other_user.

Note

There may not be any blank characters in the LOGFILE argument value! If the path or file name contains blank characters, wrap it into double quotes ("). Example:

--logfile="ebas logfiles/logfile.log"

Default: filename constructed of program name and start time in the current working directory (e.g. ./ebas_list_ds_2015-12-27_123147.log)

Changed in version V.3.02.00: the previous default file name included colons (:) as time separator, (e.g. ./ebas_list_ds_2015-12-27_12:31:47.log)

--profile

Activate program profiling.

Note

This option is for EBAS developers only. Code profiling is an analytical technique used in runtime optimization. Using this option has no benefit for a user (it makes the code slower actually).

Default: False

Database arguments

Database user and password can also be fetched from your .netrc file (recommended authentication method).

See Database Authentication for more information on this. If you want to use netrc authentication, you need to leave the --dbUser and --dbPasswd agruments away!

--nodb

Do not connect to the database. Most programs will have limited functionality without database connection (e.g. ebas_insert can still perform file syntax checks and limited semantic checks). Some programs will lose all their functionality, but even in those cases, the nodb option might be useful for tests or other arguments validation, or simply to get a –version and –help output when the database is unreachable.

New in version 3.00.07.

--dbHost=DBHOST

Specify the database host name according to the database client configuration file (sql.ini for sybase drivers, freetds.conf for freetds drivers). If this sounds unfamiliar, ask a system administrator. At NILU, we use two database servers for EBAS, ODIN (database server for the test system) and SLEIPNER (operational database server). The respective names to be used for the argument –dbHost on ratatoskr are:

  • ODIN_ASE
  • SLEIPNER_ASE
--db=DBNAME

The name of the database to be used. If you don’t know for sure that you want to change this, leave it at the default (ebas_new)

--dbUser=DBUSER

The user name to be used for connecting to the database. See Database Authentication.

--dbPasswd=DBPASSWD

The password to be used for connecting to the database. See Database Authentication.

Warning

Please do NOT use the --dbPasswd argument!

On multi-user UNIX systems, each users is able to see the command used to start any program on the system.

Therefore, including sensitive information in commands is generally strongly discouraged on UNIX systems.

See Database Authentication for alternatives to using the --dbPasswd argument.

Note

There may not be any blank characters in the DBPASSWD argument value! If the password contains blank characters, wrap it into double quotes ("). Example:

--dbPasswd="with blank"

Commandline arguments for dataset selection criteria

Dataset selection criteria are used to define a set of datasets to be exported by ebas_extract.

--do_id=DO_ID

Selection by download id (set of datasets to be downloaded. Normally specified by EBAS Web when using ebas_extract as a back-end for the web download.

Optional, Default: None

-k, --setkey=SETKEY

Selection by dataset setkey.

This argument requires an argument value SETKEY. SETKEY can be a single setkey, a range of stekeys (SETKEY1-SETKEY2), or a list of setkeys (SETKEY1,SETKEY2,…).

Note

Make sure there are no blank characters in the argument value (not even after the commas or around the hyphens)!

Otherwise, the argument parser assumes that the next argument starts after a blank character and an error will occur.

Optional, Default: None

-s, --station=STATION_CODE

Selection by station code.

This argument requires an argument value STATION_CODE. STATION_CODE can be a single station code or a list of station codes (STATION_CODE1,STATION_CODE2,…). In addition, each station code can be specified exactly (full station code, e.g. NO0002R), partly without station type (e.g. NO0002) or partly with only the country code part (e.g. NO, which matches all Norwegian stations).

Note

Make sure there are no blank characters in the argument value (not even after the commas)!

Otherwise, the argument parser assumes that the next argument starts after a blank character and an error will occur.

Examples:

Select all datasets from the station which station codes is NO0001R:

--station NO0001R

Same, but you can’t remember the station type of the station:

--station NO0001

Select all datasets from stations which station codes are in NO0001R, NO0002R and NO0042G:

--station NO0001R,NO0002R,NO0042G

Same, but you can’t remember the station types of those 3 stations:

--station NO0001,NO0002,NO0042

Select all datasets from Norwegian and Austrian stations:

--station NO,AT

Optional, Default: None

-p, --project=PROJECT

Selection by project acronym.

This argument requires an argument value PROJECT. PROJECT can be a single project acronym or a list of project acronyms (PROJECT1,PROJECT2,…).

Note

Make sure there are no blank characters in the argument value (not even after the commas)!

Otherwise, the argument parser assumes that the next argument starts after a blank character and an error will occur.

Optional, Default: None

-i, --instrument=INSTRUMENT

Selection by instrument type.

This argument requires an argument value INSTRUMENT. INSTRUMENT can be a single instrument type or a list of instrument types (INSTRUMENT1,INSTRUMENT2,…).

Note

Make sure there are no blank characters in the argument value (not even after the commas)!

Otherwise, the argument parser assumes that the next argument starts after a blank character and an error will occur.

Optional, Default: None

-c, --component=COMPONENT

Selection by component name.

New in version 3.01.00: Strict synonyms can be used instead of component names. Lookup names, as a special case of strict synonyms, may be used case insensitively, but only if there are not conflicts with other component names and synonyms. E.g.:

  • --component mg will find the componnet name magnesium
  • --component co will throw an error message (may be the synonym Co for component cobalt or synonym CO for component carbon_monoxide)
  • --component Co and --component CO will still work as expected.

This argument requires an argument value COMPONET. COMPONENT can be a single component name or a list of component names (COMPONENT1,COMPONENT2,…).

Note

Make sure there are no blank characters in the argument value (not even after the commas)!

Otherwise, the argument parser assumes that the next argument starts after a blank character and an error will occur.

Optional, Default: None

-m, --matrix=MATRIX

Selection by matrix name.

This argument requires an argument value MATRIX. MATRIX can be a single matrix name or a list of matrix names (MATRIX1,MATRIX2,…).

Note

Make sure there are no blank characters in the argument value (not even after the commas)!

Otherwise, the argument parser assumes that the next argument starts after a blank character and an error will occur.

Optional, Default: None

-g, --group=GROUP

Selection by parameter group.

This argument requires an argument value GROUP. GROUP can be a single parameter group or a list of parameter groups (GROUP1,GROUP2,…).

Note

Make sure there are no blank characters in the argument value (not even after the commas)!

Otherwise, the argument parser assumes that the next argument starts after a blank character and an error will occur.

Optional, Default: None

--fi_ref=FI_REF

Selection by instrument reference.

This argument requires an argument value FI_REF. FI_REF can be a single instrument reference or a list of instrument references (FI_REF1,FI_REF2,…).

Note

Make sure there are no blank characters in the argument value (not even after the commas)!

Otherwise, the argument parser assumes that the next argument starts after a blank character and an error will occur.

Optional, Default: None

--me_ref=ME_REF

Selection by method reference.

This argument requires an argument value ME_REF. ME_REF can be a single method reference or a list of method references (ME_REF1,ME_REF2,…).

Note

Make sure there are no blank characters in the argument value (not even after the commas)!

Otherwise, the argument parser assumes that the next argument starts after a blank character and an error will occur.

Optional, Default: None

--resolution=RESOLUTION

Selection by resolution code.

This argument requires an argument value RESOLUTION. RESOLUTION can be a single resolution code or a list of resolution codes (RESOLUTION1,RESOLUTION2,…).

Note

Make sure there are no blank characters in the argument value (not even after the commas)!

Otherwise, the argument parser assumes that the next argument starts after a blank character and an error will occur.

Optional, Default: None

--statistics=STATISTICS

Selection by statistics code.

This argument requires an argument value STATISTICS. STATISTICS can be a single statistics code or a list of statistics codes (STATISTICS1,STATISTICS2,…).

Note

Make sure there are no blank characters in the argument value (not even after the commas)!

Otherwise, the argument parser assumes that the next argument starts after a blank character and an error will occur.

Optional, Default: None

--datalevel=DATALEVEL

Selection by data level.

This argument requires an argument value DATALEVEL. DATALEVEL can only be a single data level.

Optional, Default: None

New in version 3.00.08.

Time interval criteria

-t, --time=TIME

Specifies the time interval the program should operate on.

This argument requires an argument value TIME. TIME has the format FROM[-TO]. FROM and TO each has the format YYYY[-MM[-DD[THH[:MM[:SS]]]]].

If only one time is specified, the period which is defined by the precision of the time format will be chosen:

2012-01 –> [2011-01-01T00:00:00,2011-02-01T00:00:00[ (i.e. the period January 2011).

2011-11-23T11 –> [11:00, 12:00[ on 23 Nov.

If both FROM and TO are given, the interval is defined by the start of the FROM period and the end of the TO period:

2011-01-2012-01-22 –> [2011-01-01T00:00:00, 2012-01-23T00:00:00[

The time criteria works by overlapping the given criteria with the measurement sample sequence.

When time is specified as YYYY or YYYY-MM, a slightly different approach is used: sample sequences that overlap only partly with the first or last sample in the sequence are not included. This is to prevent non-intuitive overlaps when one e.g. 07:00-07:00 sample overlaps from the last month or year of data

With other words: when specifying only YYYY or YYYY-MM, a hit is only considered, if at least one _full_ sample is within the respective year or month.

Note

Make sure there are no blank characters in the argument value (not even around the hyphens when specifying FROM and TO times)!

Otherwise, the argument parser assumes that the next argument starts after a blank character and an error will occur.

History arguments

--state=STATE

Used to query the database in an historic state.

This argument requires an argument value STATE. STATE is the historic point in time you want to look at the data. The result of the operation will be the same as if the operation had been performed at the historic point in time specified. This can be thought of as a time travel option (unfortunately we can only travel back in time - sorry, no future observations in this version of EBAS). See Historic states of data for more information.

Format: YYYY[-MM[-DD[THH[:MM[:SS]]]]].

Note

Make sure there are no blank characters in the argument value!

Otherwise, the argument parser assumes that the next argument starts after a blank character and an error will occur.

Default: If the –state option is not given, the current point in time is used (current state of the database)

Examples:

--state 2015-03-27T23:21:12 queries the database as it was 27th March 2015, 23:21:12

--state 2015-01-01T00:00:00 queries the database as it was 1st January 2015 midnight

--state 2015-01-01 same state, but less typing

--state 2015 same state, minimal typing

--diff=DIFF

The diff criteria is used to restrict the set of data and metadata to only those that have been changed between two historic states of the database.

This argument requires an argument value DIFF.

Format: YYYY[-MM[-DD[THH[:MM[:SS]]]]].

Note

Make sure there are no blank characters in the argument value!

Otherwise, the argument parser assumes that the next argument starts after a blank character and an error will occur.

Only data and metadata that changed between the database state specified by DIFF and the database state specified by the argument --state will be processed. If the --state argument is not given, the current database state is used and only data changed since DIFF will be processed (see the default value for the argument --state).

The usual case will be that DIFF is earlier in time then STATE, however the opposite case is technically possible: Work on data in an old database state which will have been changed at a certain point of time in the future.

Examples:

--diff 2015-03-27T23:21:12 --state 2015-05-01T00:00:00 Only processes data and metadata changed between 27th March 2015, 23:21:12 and 1st of May 2015 midnight. The results will be in the database state of 1st of May 2015.

--diff 2015-01-01T00:00:00 Only processes data and metadata changed since 1st January 2015 midnight. The results will be in the current database state.

--diff 2015-01-01 same, but less typing

--diff 2015 same, minimal typing

New in version 3.00.08.

On behalf of…

--us_id=US_ID

Used to query the database on behalf of somebody else. This option is only available for EBAS administrators and is used to generate data extractions on request by data users. In the access statistics, the respective user who ordered the data is mentioned instead of the actual user technically extracting the data.

Arguments specific to ebas_extract

--format=FORMAT
file format for output:
  • NasaAmes
  • CSV
  • XML
  • NetCDF

New in version 3.00.08: NetCDF

(default: NasaAmes)

--multicolumn

multicolumn output, multiple variables per file (default: False)

--expand

expand multicolumn output, add associated datasets (default: False)

--precip_amount, --amount

add associated precipitation amount datasets for each precipitation concentration dataset (default: False)

--xmlwrap

wrap output in xml containers (default: False)

--createfiles

create files instead of output to stdout (default: False)

--destdir=DESTDIR

set output directory for files (only allowed after --createfiles) (default: None)

--flags=FLAG_OUTPUT_STYLE

flag columns style:

  • one-or-all (default):

    If all variables share the same sequence of flags throughout the whole file, use one flag column as last column. Else, one flag column per variable is used. This is the default behavior starting from EBAS 3.0.

  • compress:

    If multiple variables share the same sequence of flags throughout the whole file, one flag column after this group of variables is used. This produces files as narrow as possible without losing any flag information. This used to be the default behavior up to EBAS 2.2.

  • all

    All variables get a dedicated flag column.

  • none

    No flag columns are exported. Invalid or missing data are both reported as MISSING value. This should be used very carfully, as information is LOST on export! Intended for non expert uses, as the easiest approach to process only valid data, without bothering about the EBAS flag system. Note: Detection limit values (flag 781) are exported as value/2.0 (only in this case, when no flag information is extracted).

    As a general rule, a flag column applies ALWAYS to all preceding variables after the previous flag column. (default: one-or-all)

--metadata_options=METADATA_OPTIONS

metadata options for output (curretnly only one available option):

  • setkey
    Include the dataset key in the metadata output (not included by default)

(default: None)

New in version 3.00.08.

--fileindex=FILEINDEX

file path and name for ebas file index database (sqlite3). Prepend a plus sign (+) to the filename in order to add to an existing database instead of creating a new one. (default: None)

--diffxml=DIFFXML

file path and name for ebas diff xml output. Only useful in conjunction with the --diff argument. Creates an output xml file with deleted and added data intervals for each dataset. This xml file can be used to synchronize a local repository at the recipient side if diff extracts are used. (default: None)

See Differential extracts for more information on differential extracts. Please refer to the EBAS diffxml - File Format Specification for details on this file format.

New in version 3.00.08.

Differential extracts

New in version 3.00.08.

A special case for utilizing historic states of the EBAS database is the possibility for generating differential data extracts.

Using the arguments --diff and --diffxml (and maybe --state in addition), ebas_extract can produce differential extracts.

This is most useful for “updating” a data user about changes in the database since the last extract she received.

  • The --diff argument makes sure, only data that exist in the database and have been changed since a specific date are extracted
  • The argument --diffxml generates an xml file with all changes relative to the old database state (also includes information on data intervals that have been deleted). Please refer to the EBAS diffxml - File Format Specification for details on this file format.

Example:

ebas_extract --...  --state 2015-12-01 --diff 2015-11-01 --diffxml diffexport.xml

Produces an extract with various filter conditions (--...) at the database state of 1st December 2015 midnight, but extracts only data changed between 1st November and 1 December. Additionally to the datafiles, there will be a file named diffexport.xml containing additional information about the changes.

Forward and reversed differential extracts

Although the usual case would be a forward differential extract (the timestamp specified by the --diff is earlier in time then the selected database state), backward differential extracts are technically possible.

A forward differential extract would generate the recipe for a data user to update a data extract in an old state to a newer state. This is what data users usually need.