Synopsis

ebasflow is a script to support the workflow for files submitted to EBAS. It’s a first step to systematise the work in the directories underarbeid, original, ibasen, etc.

Overview / Usage

ebasflow is installed on prod-ebas01. To get an overview of all commandline parameters, one can use:

ebasflow --help

The most important (and mandatory arguments are):

Actions

queue

This action puts a file in the EBAS dataflow queue.

Usually a file within the source directories is handled here. But a file can also enter from other directories:

Queuing files form the original or ibasen directories is prohibited.

The file is queued (i.e. copied to the underarbeid directory) and at the same time archived (i.e. moved to the original directory).

archive

This action archives the file as originally submitted (i.e. moved to the original directory).

This action is mainly used to archive files with data level 0 and 1.

For level 2 files, this action should only be used in exceptional cases to clean up previous errors in the workflow. The standard workflow automatically archives the files when they are queued, thus to only archive a file should normally not be necessary.

A file from any directory (except original) can be archived, however a warning must be confirmed when a file form underarbeid, waiting, rejected or ibasen should be archived.

ibasen

This action moves a file to the archive of imported files (the ibasen directory).

Only files in underarbeid and waiting can be handled here. Files form all other directories are prohibited.

wait

This action moves a file to the waiting area (the waiting directory).

Only files in underarbeid and rejected can be handled here. Files form all other directories are prohibited.

reject

This action moves a file to the archive of rejected files (the rejected directory).

Only files in underarbeid and waiting can be handled here. Files form all other directories are prohibited.

update

This action does not change the files state in the workflow. It only updates the file metadata in the work flow:

Only files in underarbeid, waiting, rejected and ibasen can be handled here. Files form all other directories are prohibited.

EBAS dataflow directory structure

General file organisation

With the exception of the source directories, the files are generally organised hierarchically in the form:

- country (2 char, lower case)
  |
  - station (6 char, first two equal country, then 4 numeric)
     |
     - data level ('level' + data level number, e.g. 'level0')

For cases where data are submitted country-wise (multiple stations are submitted at the same time), the station level can alternatively be omitted:

- country (2 char, lower case)
  |
  - data level ('level' + data level number, e.g. 'level0')

Usually, one submission consists of one Nasa Ames File, with an exception in the case when data are submitted country-wise (multiple stations are submitted at the same time). In this case, data are usually submitted as zip archive (or other type of archive). Those should be extracted to a subdirectory and the subdirectory should be placed in the hierarchy. The ebasflow script can handle directories for that matter.

Data flow directories

The data flow directories are categorised in order to give the files below a current status in the workflow. Below the different states are listed with their (default) file paths. The file paths can be customized in the configuration file or as commandline arguments

source

The source directories are the ones incoming files usually are first stored when submitted to EBAS.

Default location: There are currently two directories: /viper/wdca/gooddata/ and /viper/wdca/evilddata/

File organisation: Usually the files are just stored flat in the two source directories.

original

The original directory contains an archived version of all processed files in history. The archiving should be done during the first action on a file (i.e. queue).

Default location: /viper/ebas/original/

File organisation: Hierarchically

underarbeid

The underarbeid directory is the queue of all submissions ready for inspection, check and import into EBAS.

Default location: /viper/ebas/underarbeid/

File organisation: Hierarchically

ibasen

The ibasen directory contains all files which have been imported into the database.

Default location: /viper/ebas/ibasen/

File organisation: Hierarchically

waiting

The waiting directory contains all files which failed check routines and the problems seem to be minor, i.e. could be fixed after getting additional information from the data submitters. This could also be cases where a re-confirmation is needed in case of possible misunderstandings. All files in the waiting directory should contain a reference to a mantis issue!

Default location: /viper/ebas/waiting/

File organisation: Hierarchically

rejected

The rejected directory contains all files which failed check routines and the problems are too severe to fix the on our side. Thus a new version had to be requested from the data submitter. The old version is stored in rejected for possible future reference. All files in the rejected directory should contain a reference to a mantis issue!

Default location: /viper/ebas/rejected/

File organisation: Hierarchically

Recognition of station code and data level

In order to organise the files in the correct hierarchical location, ebasflow needs to know the country, station code and data level of a submission. This information is obtained in two ways:

Mantis issues

Mantis issues can be assigned to any file in the workflow at any time (use argument --mantis (-m)). One or more issues can be assigned to a file. Technically, the issue reference is appended to the file name in the format __mantis_<#>, e.g. the file name orig_filename__mantis_12__mantis_122 means that mantis issues 12 and 122 have been assigned to the file orig_filename.

Assigning a mantis issue is mandatory when performing the actions wait or reject.

Examples

Add an incoming file to the queue

A new file was submitted and needs to be queued for QA and ingestion.

Command:

paul@prod-ebas01:~ $ ebasflow queue /viper/wdca/gooddata/NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev2.nas

Output:

INFO    : Queuing file '/viper/wdca/gooddata/NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev2.nas'
INFO    : Copy to '/viper/ebas/underarbeid/no/no0002/level2/NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev2.nas'
INFO    : Move to '/viper/ebas/original/no/no0002/level2/NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev2.nas'

Archive an incoming level 0 file

Usually level 0 (and level 1) data files should not be ingested into EBAS, but only stored in the original archive.

Command:

paul@prod-ebas01:gooddata $ ebasflow archive NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev0.nas

Output:

INFO    : Archiving file '/viper/wdca/gooddata/NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev0.nas'
INFO    : Move to '/viper/ebas/original/no/no0002/level0/NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev0.nas'

Processing the file shows problems

The responsible data manager inspected the file and discovered some problems in the file. Opens a mantis issue (issue #701).

Command:

# change dir, to show with a relative file name
paul@prod-ebas01:~ $ cd /viper/ebas/underarbeid/no/no0002/level2

paul@prod-ebas01:level2 $ ebasflow wait NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev2.nas 

Output:

Mantis issue number (mandatory): 701
INFO    : Set file '/viper/ebas/underarbeid/no/no0002/level2/NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev2.nas' to waiting
INFO    : Move to '/viper/ebas/waiting/no/no0002/level2/NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev2.nas__mantis_701'

Add another mantis issue

Soon after, the data manager discovers another mantis issue which is relevant for the file.

Command:

paul@prod-ebas01:level2 $ ebasflow -m 623 update NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev2.nas__mantis_701

Output:

INFO    : Update file '/viper/ebas/waiting/no/no0002/level2/NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev2.nas__mantis_701'
INFO    : Rename to '/viper/ebas/waiting/no/no0002/level2/NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev2.nas__mantis_701__mantis_623'

Issues resolved, file imported

Command:

paul@prod-ebas01:~ $ ebasflow ibasen /viper/ebas/waiting/no/no0002/level2/NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev2.nas__mantis_701__mantis_623 

Output:

INFO    : Set file '/viper/ebas/waiting/no/no0002/level2/NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev2.nas__mantis_701__mantis_623' to ibasen
INFO    : Move to '/viper/ebas/ibasen/no/no0002/level2/NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev2.nas__mantis_701__mantis_623'

Correct a file’s location

The file /viper/ebas/underarbeid/nl/NL0011R.20170919161453.20180308080000.online_ptr.OVOC.air.9d.15mn.FR01L_lsce_ptr_sri.FR01L_ptr_sri_cabauw.lev2.nas is obviously in the wrong location according to the standard file hierarchy.

Command:

paul@prod-ebas01:~ $ ebasflow update /viper/ebas/underarbeid/nl/NL0011R.20170919161453.20180308080000.online_ptr.OVOC.air.9d.15mn.FR01L_lsce_ptr_sri.FR01L_ptr_sri_cabauw.lev2.nas 

Output:

INFO    : Update file '/viper/ebas/underarbeid/nl/NL0011R.20170919161453.20180308080000.online_ptr.OVOC.air.9d.15mn.FR01L_lsce_ptr_sri.FR01L_ptr_sri_cabauw.lev2.nas'
INFO    : Rename to '/viper/ebas/underarbeid/nl/nl0011/level2/NL0011R.20170919161453.20180308080000.online_ptr.OVOC.air.9d.15mn.FR01L_lsce_ptr_sri.FR01L_ptr_sri_cabauw.lev2.nas'