ebasflow
is a script to support the workflow for files submitted to EBAS. It’s a first step to systematise the work in the directories underarbeid, original, ibasen, etc.
ebasflow
is installed on prod-ebas01. To get an overview of all commandline parameters, one can use:
ebasflow --help
The most important (and mandatory arguments are):
This action puts a file in the EBAS dataflow queue.
Usually a file within the source directories is handled here. But a file can also enter from other directories:
Queuing files form the original or ibasen directories is prohibited.
The file is queued (i.e. copied to the underarbeid directory) and at the same time archived (i.e. moved to the original directory).
This action archives the file as originally submitted (i.e. moved to the original directory).
This action is mainly used to archive files with data level 0 and 1.
For level 2 files, this action should only be used in exceptional cases to clean up previous errors in the workflow. The standard workflow automatically archives the files when they are queued, thus to only archive a file should normally not be necessary.
A file from any directory (except original) can be archived, however a warning must be confirmed when a file form underarbeid, waiting, rejected or ibasen should be archived.
This action moves a file to the archive of imported files (the ibasen directory).
Only files in underarbeid and waiting can be handled here. Files form all other directories are prohibited.
This action moves a file to the waiting area (the waiting directory).
Only files in underarbeid and rejected can be handled here. Files form all other directories are prohibited.
This action moves a file to the archive of rejected files (the rejected directory).
Only files in underarbeid and waiting can be handled here. Files form all other directories are prohibited.
This action does not change the files state in the workflow. It only updates the file metadata in the work flow:
--mantis
(-m
))Only files in underarbeid, waiting, rejected and ibasen can be handled here. Files form all other directories are prohibited.
With the exception of the source directories, the files are generally organised hierarchically in the form:
- country (2 char, lower case)
|
- station (6 char, first two equal country, then 4 numeric)
|
- data level ('level' + data level number, e.g. 'level0')
For cases where data are submitted country-wise (multiple stations are submitted at the same time), the station level can alternatively be omitted:
- country (2 char, lower case)
|
- data level ('level' + data level number, e.g. 'level0')
Usually, one submission consists of one Nasa Ames File, with an exception in the case when data are submitted country-wise (multiple stations are submitted at the same time). In this case, data are usually submitted as zip archive (or other type of archive). Those should be extracted to a subdirectory and the subdirectory should be placed in the hierarchy. The ebasflow script can handle directories for that matter.
The data flow directories are categorised in order to give the files below a current status in the workflow. Below the different states are listed with their (default) file paths. The file paths can be customized in the configuration file or as commandline arguments
The source directories are the ones incoming files usually are first stored when submitted to EBAS.
Default location: There are currently two directories: /viper/wdca/gooddata/
and /viper/wdca/evilddata/
File organisation: Usually the files are just stored flat in the two source directories.
The original directory contains an archived version of all processed files in history. The archiving should be done during the first action on a file (i.e. queue).
Default location: /viper/ebas/original/
File organisation: Hierarchically
The underarbeid directory is the queue of all submissions ready for inspection, check and import into EBAS.
Default location: /viper/ebas/underarbeid/
File organisation: Hierarchically
The ibasen directory contains all files which have been imported into the database.
Default location: /viper/ebas/ibasen/
File organisation: Hierarchically
The waiting directory contains all files which failed check routines and the problems seem to be minor, i.e. could be fixed after getting additional information from the data submitters. This could also be cases where a re-confirmation is needed in case of possible misunderstandings. All files in the waiting directory should contain a reference to a mantis issue!
Default location: /viper/ebas/waiting/
File organisation: Hierarchically
The rejected directory contains all files which failed check routines and the problems are too severe to fix the on our side. Thus a new version had to be requested from the data submitter. The old version is stored in rejected for possible future reference. All files in the rejected directory should contain a reference to a mantis issue!
Default location: /viper/ebas/rejected/
File organisation: Hierarchically
In order to organise the files in the correct hierarchical location, ebasflow
needs to know the country, station code and data level of a submission. This information is obtained in two ways:
.lev<n>.
), those elements are automatically usedMantis issues can be assigned to any file in the workflow at any time (use argument --mantis
(-m
)). One or more issues can be assigned to a file. Technically, the issue reference is appended to the file name in the format __mantis_<#>
, e.g. the file name orig_filename__mantis_12__mantis_122
means that mantis issues 12 and 122 have been assigned to the file orig_filename
.
Assigning a mantis issue is mandatory when performing the actions wait or reject.
A new file was submitted and needs to be queued for QA and ingestion.
Command:
paul@prod-ebas01:~ $ ebasflow queue /viper/wdca/gooddata/NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev2.nas
Output:
INFO : Queuing file '/viper/wdca/gooddata/NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev2.nas'
INFO : Copy to '/viper/ebas/underarbeid/no/no0002/level2/NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev2.nas'
INFO : Move to '/viper/ebas/original/no/no0002/level2/NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev2.nas'
Usually level 0 (and level 1) data files should not be ingested into EBAS, but only stored in the original archive.
Command:
paul@prod-ebas01:gooddata $ ebasflow archive NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev0.nas
Output:
INFO : Archiving file '/viper/wdca/gooddata/NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev0.nas'
INFO : Move to '/viper/ebas/original/no/no0002/level0/NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev0.nas'
The responsible data manager inspected the file and discovered some problems in the file. Opens a mantis issue (issue #701).
Command:
# change dir, to show with a relative file name
paul@prod-ebas01:~ $ cd /viper/ebas/underarbeid/no/no0002/level2
paul@prod-ebas01:level2 $ ebasflow wait NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev2.nas
Output:
Mantis issue number (mandatory): 701
INFO : Set file '/viper/ebas/underarbeid/no/no0002/level2/NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev2.nas' to waiting
INFO : Move to '/viper/ebas/waiting/no/no0002/level2/NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev2.nas__mantis_701'
Soon after, the data manager discovers another mantis issue which is relevant for the file.
Command:
paul@prod-ebas01:level2 $ ebasflow -m 623 update NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev2.nas__mantis_701
Output:
INFO : Update file '/viper/ebas/waiting/no/no0002/level2/NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev2.nas__mantis_701'
INFO : Rename to '/viper/ebas/waiting/no/no0002/level2/NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev2.nas__mantis_701__mantis_623'
Command:
paul@prod-ebas01:~ $ ebasflow ibasen /viper/ebas/waiting/no/no0002/level2/NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev2.nas__mantis_701__mantis_623
Output:
INFO : Set file '/viper/ebas/waiting/no/no0002/level2/NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev2.nas__mantis_701__mantis_623' to ibasen
INFO : Move to '/viper/ebas/ibasen/no/no0002/level2/NO0002R.20090101000000.20180620000000.online_crds.GHG.air.1y.1d.NO01L_CFADS19.NO01L_picarro.lev2.nas__mantis_701__mantis_623'
The file /viper/ebas/underarbeid/nl/NL0011R.20170919161453.20180308080000.online_ptr.OVOC.air.9d.15mn.FR01L_lsce_ptr_sri.FR01L_ptr_sri_cabauw.lev2.nas
is obviously in the wrong location according to the standard file hierarchy.
Command:
paul@prod-ebas01:~ $ ebasflow update /viper/ebas/underarbeid/nl/NL0011R.20170919161453.20180308080000.online_ptr.OVOC.air.9d.15mn.FR01L_lsce_ptr_sri.FR01L_ptr_sri_cabauw.lev2.nas
Output:
INFO : Update file '/viper/ebas/underarbeid/nl/NL0011R.20170919161453.20180308080000.online_ptr.OVOC.air.9d.15mn.FR01L_lsce_ptr_sri.FR01L_ptr_sri_cabauw.lev2.nas'
INFO : Rename to '/viper/ebas/underarbeid/nl/nl0011/level2/NL0011R.20170919161453.20180308080000.online_ptr.OVOC.air.9d.15mn.FR01L_lsce_ptr_sri.FR01L_ptr_sri_cabauw.lev2.nas'