Using CleF - Climate Finder to discover ESGF data at NCI#

Paola Petrelli, CLEX CMS

This blog is the first of three showing examples of how to use the CleF (Climate Finder) python module to search for ESGF data on the NCI server.
Currently the tool is set up for CMIP5, CMIP6 and CORDEX data published by the ESGF.

CleF is currently installed in the CMS conda module analysis3, and analysis3-unstable for the latest version. This is managed by the CMS and is available simply by running

module use /g/data3/hh5/public/modules
module load conda/analysis3

You need to be a member of hh5 to use the modules and of one of the CMIP projects: oi10,rr3, fs38, al33 to access the data and the clef database.

This blog covers the basic usage of the command line options.

In the next blog we’ll cover how to:

  • save the query results as a csv file

  • get a summary of available data

  • run more complex queries

  • get extra information from the ESDOC documentation as errate and citations

In the third blog we will cover how to import and use clef in your own python code.
Let’s start!

Command syntax#

# run this if you haven't done so already in the terminal
#!module use /g/data3/hh5/public/modules
#!module load conda/analysis3-unstable
# using unstable guarantees that the latest features ar eall available, but clef is also available in stable
!clef
Usage: clef [OPTIONS] COMMAND [ARGS]...

Options:
  --remote   returns only ESGF search results
  --local    returns only local files matching arguments in local database
  --missing  returns only missing files matching ESGF search
  --request  send NCI request to download missing files matching ESGF search
  --debug    Show debug info
  --help     Show this message and exit.

Commands:
  cmip5   Search ESGF and local database for CMIP5 files Constraints can be...
  cmip6   Search ESGF and local database for CMIP6 files Constraints can be...
  cordex  Search ESGF and local database for CORDEX files.
  ds      Search local database for non-ESGF datasets

By simpling running the command clef with no arguments, the tool shows the help message and then exits, basically it is equivalent to

clef –help

We can see currently there are 3 sub-commands, ds to query non-ESGF collections and one for each cmip dataset: cmip5 and cmip6.
There are also five different options that can be passed before the sub-commands, one we have already seen is --help. The others are used to modify how the tool will deal with the main query output. We will have a look at them and at ds later.
Let’s start from quering some CMIP5 data, to see what we can pass to the cmip5 sub-command we can simply run it with its --help option.

CMIP5#

!clef cmip5 --help
Usage: clef cmip5 [OPTIONS] [QUERY]...

  Search ESGF and local database for CMIP5 files

  Constraints can be specified multiple times, in which case they are
  combined    using OR: -v tas -v tasmin will return anything matching
  variable = 'tas' or variable = 'tasmin'. The --latest flag will check ESGF
  for the latest version available, this is the default behaviour

Options:
  -e, --experiment x              CMIP5 experiment: piControl, rcp85, amip ...
  --experiment_family [Atmos-only|Control|Decadal|ESM|Historical|Idealized|Paleo|RCP]
                                  CMIP5 experiment family: Decadal, RCP ...
  -m, --model x                   CMIP5 model acronym: ACCESS1.3, MIROC5 ...
  -t, --table, --mip [Amon|Omon|OImon|LImon|Lmon|6hrPlev|6hrLev|3hr|Oclim|Oyr|aero|cfOff|cfSites|cfMon|cfDay|cf3hr|day|fx|grids]
  -v, --variable x                Variable name as shown in filanames: tas,
                                  pr, sic ...

  -en, --ensemble, --member TEXT  CMIP5 ensemble member: r#i#p#
  --frequency [mon|day|3hr|6hr|fx|yr|monClim|subhr]
  --realm [atmos|ocean|land|landIce|seaIce|aerosol|atmosChem|ocnBgchem]
  --and [variable|experiment|cmor_table|realm|time_frequency|model|ensemble]
                                  Attributes for which we want to add AND
                                  filter, i.e. `--and variable` to apply to
                                  variable values

  --institution TEXT              Modelling group institution id: MIROC, IPSL,
                                  MRI ...

  --cf_standard_name TEXT         CF variable standard_name, use instead of
                                  variable constraint

  --latest / --all-versions       Return only the latest version or all of
                                  them. Default: --latest

  --replica / --no-replica        Return both original files and replicas.
                                  Default: --no-replica

  --distrib / --no-distrib        Distribute search across all ESGF nodes.
                                  Default: --distrib

  --csv / --no-csv                Send output to csv file including extra
                                  information. Default: --no-csv

  --stats / --no-stats            Write summary of query results, works only
                                  with --local option. Default: --no-stats

  --debug / --no-debug            Show debug output. Default: --no-debug
  --help                          Show this message and exit.

Passing arguments and options#

The --help shows all the constraints we can pass to the tool, there are also some additional options which can change the way we run our query. For the moment we can ignore these and use their default values.
Some of the constraints can be passed using an abbreviation,like -v instead of --variable. This is handy once you are more familiar with the tool.
The same option can have more than one name, for example --ensemble can also be passed as --member, this is because the terminology has changed between CMIP5 and CMIP6.
You can pass how many constraints you want and pass the same constraint more than once. Let’s see what happens though if we do not pass any constraint.

!clef cmip5
ERROR: Too many results (3781387), try limiting your search https://esgf.nci.org.au/search/esgf-nci?query=&type=File&distrib=True&replica=False&latest=True&project=CMIP5
!clef cmip5 --variable tasmin --experiment historical --table day --ensemble r2i1p1s
ERROR: No matches found on ESGF, check at https://esgf.nci.org.au/search/esgf-nci?query=&type=File&distrib=True&replica=False&latest=True&project=CMIP5&ensemble=r2i1p1s&experiment=historical&cmor_table=day&variable=tasmin

Oops that wasn’t reasonable! I mispelled the ensemble “r2i1p1s” does not exists and the tool is telling me it cannot find any matches.

!clef cmip5 --variable tasmin --experiment historical --table days --ensemble r2i1p1
Usage: clef cmip5 [OPTIONS] [QUERY]...
Try 'clef cmip5 --help' for help.

Error: Invalid value for '--table' / '--mip' / '-t': invalid choice: days. (choose from Amon, Omon, OImon, LImon, Lmon, 6hrPlev, 6hrLev, 3hr, Oclim, Oyr, aero, cfOff, cfSites, cfMon, cfDay, cf3hr, day, fx, grids)

Made another spelling mistake, in this case the tool knows that I passed a wrong value and lists for me all the available options for the CMOR table. Eventually we are aiming to validate all the arguments we can, although for some it is no possible to pass all the possible values (ensemble for example).

!clef cmip5 --variable tasmin --experiment historical --table day --ensemble r2i1p1
/g/data/al33/replicas/CMIP5/combined/CCCma/CanCM4/historical/day/atmos/day/r2i1p1/v20120207/tasmin/
/g/data/al33/replicas/CMIP5/combined/CCCma/CanCM4/historical/day/atmos/day/r2i1p1/v20120612/tasmin/
/g/data/al33/replicas/CMIP5/combined/CCCma/CanESM2/historical/day/atmos/day/r2i1p1/v20120410/tasmin/
/g/data/al33/replicas/CMIP5/combined/CNRM-CERFACS/CNRM-CM5/historical/day/atmos/day/r2i1p1/v20120703/tasmin/
/g/data/al33/replicas/CMIP5/combined/IPSL/IPSL-CM5A-LR/historical/day/atmos/day/r2i1p1/v20130506/tasmin/
/g/data/al33/replicas/CMIP5/combined/IPSL/IPSL-CM5A-MR/historical/day/atmos/day/r2i1p1/v20130506/tasmin/
/g/data/al33/replicas/CMIP5/combined/LASG-IAP/FGOALS-s2/historical/day/atmos/day/r2i1p1/v20161204/tasmin/
/g/data/al33/replicas/CMIP5/combined/MIROC/MIROC-ESM/historical/day/atmos/day/r2i1p1/v20120710/tasmin/
/g/data/al33/replicas/CMIP5/combined/MIROC/MIROC4h/historical/day/atmos/day/r2i1p1/v20120628/tasmin/
/g/data/al33/replicas/CMIP5/combined/MIROC/MIROC5/historical/day/atmos/day/r2i1p1/v20120710/tasmin/
/g/data/al33/replicas/CMIP5/combined/MOHC/HadCM3/historical/day/atmos/day/r2i1p1/v20140110/tasmin/
/g/data/al33/replicas/CMIP5/combined/MOHC/HadGEM2-CC/historical/day/atmos/day/r2i1p1/v20111129/tasmin/
/g/data/al33/replicas/CMIP5/combined/MOHC/HadGEM2-ES/historical/day/atmos/day/r2i1p1/v20110418/tasmin/
/g/data/al33/replicas/CMIP5/combined/MPI-M/MPI-ESM-LR/historical/day/atmos/day/r2i1p1/v20111006/tasmin/
/g/data/al33/replicas/CMIP5/combined/MPI-M/MPI-ESM-MR/historical/day/atmos/day/r2i1p1/v20120503/tasmin/
/g/data/al33/replicas/CMIP5/combined/MPI-M/MPI-ESM-P/historical/day/atmos/day/r2i1p1/v20120315/tasmin/
/g/data/al33/replicas/CMIP5/combined/MRI/MRI-CGCM3/historical/day/atmos/day/r2i1p1/v20120701/tasmin/
/g/data/al33/replicas/CMIP5/combined/NCC/NorESM1-M/historical/day/atmos/day/r2i1p1/v20110901/tasmin/
/g/data/al33/replicas/CMIP5/combined/NOAA-GFDL/GFDL-CM3/historical/day/atmos/day/r2i1p1/v20120227/tasmin/
/g/data/rr3/publications/CMIP5/output1/CSIRO-QCCCE/CSIRO-Mk3-6-0/historical/day/atmos/day/r2i1p1/files/tasmin_20110518/

Everything available on ESGF is also available locally

The tool first search on the ESGF for all the files that match the constraints we passed. It then looks for these file locally and if it finds them it returns their path on raijin. For all the files it can’t find locally, the tool check an NCI table listing the downloads they are working on. Finally it lists missing datasets which are in the download queue, followed by the datasets that are not available locally and no one has yet requested.

The tool list the datasets paths and dataset_ids, we used to have a --format file option but this has been removed in most recent versions.

The query by default returns the latest available version. What if we want to have a look at all the available versions?

!clef cmip5 --variable clivi --experiment historical --table Amon -m ACCESS1.0 --all-versions
/g/data/rr3/publications/CMIP5/output1/CSIRO-BOM/ACCESS1-0/historical/mon/atmos/Amon/r1i1p1/files/clivi_20120115/
/g/data/rr3/publications/CMIP5/output1/CSIRO-BOM/ACCESS1-0/historical/mon/atmos/Amon/r1i1p1/files/clivi_20120727/
/g/data/rr3/publications/CMIP5/output1/CSIRO-BOM/ACCESS1-0/historical/mon/atmos/Amon/r3i1p1/files/clivi_20140402/

Everything available on ESGF is also available locally

The option --all-versions is the reverse of --latest, which is also the default, so we get a list of all available versions.
Since all the ACCESS1.0 data is available on NCI (which is the authoritative source for the ACCESS models) the tool shouldn’t find any missing datasets, if it does please let us know about it.

CMIP6#

!clef cmip6 --help
Usage: clef cmip6 [OPTIONS] [QUERY]...

  Search ESGF and local database for CMIP6 files Constraints can be
  specified multiple times, in which case they are combined using OR:  -v
  tas -v tasmin will return anything matching variable = 'tas' or variable =
  'tasmin'. The --latest flag will check ESGF for the latest version
  available, this is the default behaviour

Options:
  -mip, --activity [AerChemMIP|C4MIP|CDRMIP|CFMIP|CMIP|CORDEX|DAMIP|DCPP|DynVarMIP|FAFMIP|GMMIP|GeoMIP|HighResMIP|ISMIP6|LS3MIP|LUMIP|OMIP|PAMIP|PMIP|RFMIP|SIMIP|ScenarioMIP|VIACSAB|VolMIP]
  -e, --experiment x              CMIP6 experiment, list of available depends
                                  on activity

  --source_type [AER|AGCM|AOGCM|BGC|CHEM|ISM|LAND|OGCM|RAD|SLAB]
  -t, --table x                   CMIP6 CMOR table: Amon, SIday, Oday ...
  -m, --model, --source_id x      CMIP6 model id: GFDL-AM4, CNRM-CM6-1 ...
  -v, --variable x                CMIP6 variable name as in filenames
  -mi, --member TEXT              CMIP6 member id: <sub-exp-id>-r#i#p#f#
  -g, --grid, --grid_label TEXT   CMIP6 grid label: i.e. gn for the model
                                  native grid

  -nr, --resolution, --nominal_resolution TEXT
                                  Approximate resolution: '250 km', pass in
                                  quotes

  --frequency [1hr|1hrCM|1hrPt|3hr|3hrPt|6hr|6hrPt|day|dec|fx|mon|monC|monPt|subhrPt|yr|yrPt]
  --realm [aerosol|atmos|atmosChem|land|landIce|ocean|ocnBgchem|seaIce]
  -se, --sub_experiment_id TEXT   Only available for hindcast and forecast
                                  experiments: sYYYY

  -vl, --variant_label TEXT       Indicates a model variant: r#i#p#f#
  --and [variable_id|experiment_id|table_id|realm|frequency|member_id|source_id|source_type|activity_id|grid|grid_label|nominal_resolution|sub_experiment_id]
                                  Attributes for which we want to add AND
                                  filter, i.e. `--and variable_id` to apply to
                                  variable values

  --institution TEXT              Modelling group institution id: IPSL, NOAA-
                                  GFDL ...

  --cf_standard_name TEXT         CF variable standard_name, use instead of
                                  variable constraint

  --latest / --all-versions       Return only the latest version or all of
                                  them. Default: --latest

  --replica / --no-replica        Return both original files and replicas.
                                  Default: --no-replica

  --distrib / --no-distrib        Distribute search across all ESGF nodes.
                                  Default: --distrib

  --csv / --no-csv                Send output to csv file including extra
                                  information. Default: --no-csv

  --stats / --no-stats            Write summary of query results, works only
                                  with --local option. Default: --no-stats

  --debug / --no-debug            Show debug output. Default: --no-debug
  --help                          Show this message and exit.

The cmip6 sub-command works in the same way but some constraints are different. As well as changes in terminology CMIP6 has more attributes (facets) that can be used to select the data.
Examples of these are the activity which groups experiments, resolution which is an approximation of the actual resolution and grid.

CORDEX#

!clef cordex --help
Usage: clef cordex [OPTIONS] [QUERY]...

  Search ESGF and local database for CORDEX files.

  Constraints can be specified multiple times, in which case they are
  combined    using OR: -v tas -v tasmin will return anything matching
  variable = 'tas' or variable = 'tasmin'. The --latest flag will check ESGF
  for the latest version available, this is the default behaviour NB. for
  CORDEX data associated to CMIP6 use  the cmip6 command with CORDEX as
  activity_id

Options:
  --latest / --all-versions       Return only the latest version or all of
                                  them. Default: --latest

  --replica / --no-replica        Return both original files and replicas.
                                  Default: --no-replica

  --distrib / --no-distrib        Distribute search across all ESGF nodes.
                                  Default: --distrib

  --csv / --no-csv                Send output to csv file including extra
                                  information. Works only with --local and
                                  --remote. Default: --no-csv

  --stats / --no-stats            Write summary of query results. Works only
                                  with --local and --remote. Default: --no-
                                  stats

  --debug / --no-debug            Show debug output. Default: --no-debug
  -d, --domain FACET              CORDEX region name
  -e, --experiment FACET          CMIP5 experiment of driving GCM or
                                  'evaluation' for re-analysis

  -dmod, --driving_model FACET    Model/analysis used to drive the model (eg.
                                  ECMWF­ERAINT)

  -m, --rcm_name FACET            Identifier of the CORDEX Regional Climate
                                  Model

  -rcmv, --rcm_version FACET      Identifier for reruns with perturbed
                                  parameters or smaller RCM release upgrades

  -v, --variable FACET            Variable name in file
  -f, --time_frequency FACET      Output frequency indicator
  -en, --ensemble FACET           Ensemble member of the driving GCM
  -vrs, --version FACET           Data publication version
  -cf, --cf_standard_name FACET   CF-Conventions name of the variable
  -ef, --experiment_family FACET  Experiment family: All, Historical, RCP
  -inst, --institute FACET        identifier for the institution that is
                                  responsible for the scientific aspects of
                                  the CORDEX simulation

  --and [domain|experiment|driving_model|rcm_name|rcm_version|variable|time_frequency|ensemble|version|cf_standard_name|experiment_family|institute]
                                  Attributes for which we want to add AND
                                  filter, i.e. -v tasmin -v tasmax --and
                                  variable will return only model/ensemble
                                  that have both

  --help                          Show this message and exit.

Again cordex works in the same way but some constraints are specific to its experiment design.
These are the cordex domain, rcm_name for the regional model, rcm_version and the driving_model.
CORDEX also doesn’t use tables so you always have to use f--frequency to select different timesteps.

!clef cordex -v tas -e historical -dmod CSIRO-BOM-ACCESS1-3 -en r1i1p1 -f mon
/g/data/rr3/publications/CORDEX/output/AUS-44/UNSW/CSIRO-BOM-ACCESS1-3/historical/r1i1p1/UNSW-WRF360J/v1/mon/tas/latest/
/g/data/rr3/publications/CORDEX/output/AUS-44/UNSW/CSIRO-BOM-ACCESS1-3/historical/r1i1p1/UNSW-WRF360K/v1/mon/tas/latest/
/g/data/rr3/publications/CORDEX/output/AUS-44/UNSW/CSIRO-BOM-ACCESS1-3/historical/r1i1p1/UNSW-WRF360L/v1/mon/tas/latest/
/g/data/rr3/publications/CORDEX/output/AUS-44i/UNSW/CSIRO-BOM-ACCESS1-3/historical/r1i1p1/UNSW-WRF360J/v1/mon/tas/latest/
/g/data/rr3/publications/CORDEX/output/AUS-44i/UNSW/CSIRO-BOM-ACCESS1-3/historical/r1i1p1/UNSW-WRF360K/v1/mon/tas/latest/
/g/data/rr3/publications/CORDEX/output/AUS-44i/UNSW/CSIRO-BOM-ACCESS1-3/historical/r1i1p1/UNSW-WRF360L/v1/mon/tas/latest/

Everything available on ESGF is also available locally

Controlling the ouput: clef options#

!clef --local cmip6 -e 1pctCO2 -t Amon -v tasmax -v tasmin -g gr
/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1-HR/1pctCO2/r1i1p1f2/Amon/tasmax/gr/v20191021
/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/1pctCO2/r1i1p1f2/Amon/tasmax/gr/v20180626
/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/1pctCO2/r10i1p1f2/Amon/tasmax/gr/v20200529
/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/1pctCO2/r1i1p1f2/Amon/tasmax/gr/v20181018
/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/1pctCO2/r2i1p1f2/Amon/tasmax/gr/v20181031
/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/1pctCO2/r3i1p1f2/Amon/tasmax/gr/v20181107
/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/1pctCO2/r4i1p1f2/Amon/tasmax/gr/v20190328
/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/1pctCO2/r5i1p1f2/Amon/tasmax/gr/v20200529
/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/1pctCO2/r6i1p1f2/Amon/tasmax/gr/v20200529
/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/1pctCO2/r7i1p1f2/Amon/tasmax/gr/v20200529
/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/1pctCO2/r8i1p1f2/Amon/tasmax/gr/v20200529
/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/1pctCO2/r9i1p1f2/Amon/tasmax/gr/v20200529
/g/data/oi10/replicas/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3-Veg/1pctCO2/r1i1p1f1/Amon/tasmax/gr/v20190702
/g/data/oi10/replicas/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3-Veg/1pctCO2/r1i1p1f1/Amon/tasmax/gr/v20200325
/g/data/oi10/replicas/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/1pctCO2/r3i1p1f1/Amon/tasmax/gr/v20191114
/g/data/oi10/replicas/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/1pctCO2/r3i1p1f1/Amon/tasmax/gr/v20200727
/g/data/oi10/replicas/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/1pctCO2/r1i1p1f1/Amon/tasmax/gr/v20180727
/g/data/oi10/replicas/CMIP6/CMIP/THU/CIESM/1pctCO2/r1i1p1f1/Amon/tasmax/gr/v20200417
/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1-HR/1pctCO2/r1i1p1f2/Amon/tasmin/gr/v20191021
/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-CM6-1/1pctCO2/r1i1p1f2/Amon/tasmin/gr/v20180626
/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/1pctCO2/r10i1p1f2/Amon/tasmin/gr/v20200529
/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/1pctCO2/r1i1p1f2/Amon/tasmin/gr/v20181018
/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/1pctCO2/r2i1p1f2/Amon/tasmin/gr/v20181031
/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/1pctCO2/r3i1p1f2/Amon/tasmin/gr/v20181107
/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/1pctCO2/r4i1p1f2/Amon/tasmin/gr/v20190328
/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/1pctCO2/r5i1p1f2/Amon/tasmin/gr/v20200529
/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/1pctCO2/r6i1p1f2/Amon/tasmin/gr/v20200529
/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/1pctCO2/r7i1p1f2/Amon/tasmin/gr/v20200529
/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/1pctCO2/r8i1p1f2/Amon/tasmin/gr/v20200529
/g/data/oi10/replicas/CMIP6/CMIP/CNRM-CERFACS/CNRM-ESM2-1/1pctCO2/r9i1p1f2/Amon/tasmin/gr/v20200529
/g/data/oi10/replicas/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3-Veg/1pctCO2/r1i1p1f1/Amon/tasmin/gr/v20190702
/g/data/oi10/replicas/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3-Veg/1pctCO2/r1i1p1f1/Amon/tasmin/gr/v20200325
/g/data/oi10/replicas/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/1pctCO2/r3i1p1f1/Amon/tasmin/gr/v20191114
/g/data/oi10/replicas/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/1pctCO2/r3i1p1f1/Amon/tasmin/gr/v20200727
/g/data/oi10/replicas/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/1pctCO2/r1i1p1f1/Amon/tasmin/gr/v20180727
/g/data/oi10/replicas/CMIP6/CMIP/THU/CIESM/1pctCO2/r1i1p1f1/Amon/tasmin/gr/v20200417

In this example we used the --local option for the main command clef to get only the local matching data path as output.
Note also that:

  • we are using abbreviations for the options where available;

  • we are passing the variable -v option twice;

  • we used the CMIP6 specific option -g/--grid to search for all data that is not on the model native grid. This doesn’t indicate a grid common to all the CMIP6 output only to the model itself, the same is true for member_id and other attributes.

--local is actually executing the query directly on the NCI clef.nci.org.au database, which is different from the default query where the search is executed first on the ESGF and then its results are matched locally.
In the example above the final result is exactly the same, whichever way we perform the query. This way of searching can give you more results if a node is offline or if a version have been unpublished from the ESGF but is still available locally.

!clef --missing cmip6 -e 1pctCO2 -v clw -v clwvi -t Amon -g gr
Available on ESGF but not locally:
CMIP6.CMIP.CAS.FGOALS-f3-L.1pctCO2.r1i1p1f1.Amon.clw.gr.v20200620
CMIP6.CMIP.CAS.FGOALS-f3-L.1pctCO2.r1i1p1f1.Amon.clwvi.gr.v20200620
CMIP6.CMIP.CAS.FGOALS-f3-L.1pctCO2.r2i1p1f1.Amon.clw.gr.v20200620
CMIP6.CMIP.CAS.FGOALS-f3-L.1pctCO2.r2i1p1f1.Amon.clwvi.gr.v20200620
CMIP6.CMIP.CAS.FGOALS-f3-L.1pctCO2.r3i1p1f1.Amon.clw.gr.v20200620
CMIP6.CMIP.CAS.FGOALS-f3-L.1pctCO2.r3i1p1f1.Amon.clwvi.gr.v20200620
CMIP6.CMIP.CNRM-CERFACS.CNRM-CM6-1.1pctCO2.r1i1p1f2.Amon.clw.gr.v20180626
CMIP6.CMIP.CNRM-CERFACS.CNRM-CM6-1.1pctCO2.r1i1p1f2.Amon.clwvi.gr.v20180626
CMIP6.CMIP.CNRM-CERFACS.CNRM-CM6-1-HR.1pctCO2.r1i1p1f2.Amon.clw.gr.v20191021
CMIP6.CMIP.CNRM-CERFACS.CNRM-CM6-1-HR.1pctCO2.r1i1p1f2.Amon.clwvi.gr.v20191021
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r10i1p1f2.Amon.clw.gr.v20200529
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r10i1p1f2.Amon.clwvi.gr.v20200529
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r1i1p1f2.Amon.clw.gr.v20181018
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r1i1p1f2.Amon.clwvi.gr.v20181018
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r2i1p1f2.Amon.clw.gr.v20181031
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r2i1p1f2.Amon.clwvi.gr.v20181031
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r3i1p1f2.Amon.clw.gr.v20181107
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r3i1p1f2.Amon.clwvi.gr.v20181107
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r4i1p1f2.Amon.clw.gr.v20190328
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r4i1p1f2.Amon.clwvi.gr.v20190328
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r5i1p1f2.Amon.clw.gr.v20200529
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r5i1p1f2.Amon.clwvi.gr.v20200529
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r6i1p1f2.Amon.clw.gr.v20200529
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r6i1p1f2.Amon.clwvi.gr.v20200529
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r7i1p1f2.Amon.clw.gr.v20200529
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r7i1p1f2.Amon.clwvi.gr.v20200529
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r8i1p1f2.Amon.clw.gr.v20200529
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r8i1p1f2.Amon.clwvi.gr.v20200529
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r9i1p1f2.Amon.clw.gr.v20200529
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r9i1p1f2.Amon.clwvi.gr.v20200529
CMIP6.CMIP.E3SM-Project.E3SM-1-0.1pctCO2.r1i1p1f1.Amon.clw.gr.v20190718
CMIP6.CMIP.E3SM-Project.E3SM-1-0.1pctCO2.r1i1p1f1.Amon.clwvi.gr.v20190718
CMIP6.CMIP.EC-Earth-Consortium.EC-Earth3.1pctCO2.r3i1p1f1.Amon.clwvi.gr.v20200727
CMIP6.CMIP.EC-Earth-Consortium.EC-Earth3-Veg.1pctCO2.r1i1p1f1.Amon.clwvi.gr.v20200325
CMIP6.CMIP.IPSL.IPSL-CM6A-LR.1pctCO2.r1i1p1f1.Amon.clw.gr.v20180727
CMIP6.CMIP.IPSL.IPSL-CM6A-LR.1pctCO2.r1i1p1f1.Amon.clwvi.gr.v20180727
CMIP6.CMIP.NIMS-KMA.KACE-1-0-G.1pctCO2.r1i1p1f1.Amon.clw.gr.v20190916
CMIP6.CMIP.NIMS-KMA.KACE-1-0-G.1pctCO2.r1i1p1f1.Amon.clwvi.gr.v20190916
CMIP6.CMIP.THU.CIESM.1pctCO2.r1i1p1f1.Amon.clw.gr.v20200417
CMIP6.CMIP.THU.CIESM.1pctCO2.r1i1p1f1.Amon.clwvi.gr.v20200417

This time we used the –missing option and the tool returned only the results matching the constraints that are available on the ESGF but not locally (we changed variables to make sure to get some missing data back).

!clef --remote cmip6 -e 1pctCO2 -v tasmin -t Amon -g gr
CMIP6.CMIP.CNRM-CERFACS.CNRM-CM6-1-HR.1pctCO2.r1i1p1f2.Amon.tasmin.gr.v20191021
CMIP6.CMIP.CNRM-CERFACS.CNRM-CM6-1.1pctCO2.r1i1p1f2.Amon.tasmin.gr.v20180626
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r10i1p1f2.Amon.tasmin.gr.v20200529
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r1i1p1f2.Amon.tasmin.gr.v20181018
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r2i1p1f2.Amon.tasmin.gr.v20181031
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r3i1p1f2.Amon.tasmin.gr.v20181107
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r4i1p1f2.Amon.tasmin.gr.v20190328
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r5i1p1f2.Amon.tasmin.gr.v20200529
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r6i1p1f2.Amon.tasmin.gr.v20200529
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r7i1p1f2.Amon.tasmin.gr.v20200529
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r8i1p1f2.Amon.tasmin.gr.v20200529
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r9i1p1f2.Amon.tasmin.gr.v20200529
CMIP6.CMIP.EC-Earth-Consortium.EC-Earth3-Veg.1pctCO2.r1i1p1f1.Amon.tasmin.gr.v20200325
CMIP6.CMIP.EC-Earth-Consortium.EC-Earth3.1pctCO2.r3i1p1f1.Amon.tasmin.gr.v20200727
CMIP6.CMIP.IPSL.IPSL-CM6A-LR.1pctCO2.r1i1p1f1.Amon.tasmin.gr.v20180727
CMIP6.CMIP.NIMS-KMA.KACE-1-0-G.1pctCO2.r1i1p1f1.Amon.tasmin.gr.v20200115
CMIP6.CMIP.THU.CIESM.1pctCO2.r1i1p1f1.Amon.tasmin.gr.v20200417

The --remote option returns the Dataset_ids of the data matching the constraints, regardless that they are available locally or not.

Please note that --local, --remote and --missing together with --request, which we will look at next, are all options of the main command clef and they need to come before any sub-commands.

Requesting new data#

What should we do if we found out there is some data we are interested to that has not been downloaded or requested yet?
This is a complex data collection, NCI, in consultation with the community, decided the best way to manage it was to have one point of reference. Part of this agreement is that NCI will download the files and update the database that clef is interrrogating. After consultation with the community a priority list was decided and NCI has started downloading anything that falls into it as soon as become available.

Users can then request from the NCI helpdesk, other combinations of variables, experiments etc that do not fall into this list.
The list is available from the NCI climate confluence website:
Even without consulting the list you can use clef, as we demonstrated above, to search for a particular dataset, if it is not queued or downloaded already clef will give you an option to request it from NCI.

Let’s see how it works.

%%bash
clef --request cmip6 -e 1pctCO2 -v clw -v clwvi -t Amon -g gr
no
Available on ESGF but not locally:
CMIP6.CMIP.CAS.FGOALS-f3-L.1pctCO2.r1i1p1f1.Amon.clw.gr.v20200620
CMIP6.CMIP.CAS.FGOALS-f3-L.1pctCO2.r1i1p1f1.Amon.clwvi.gr.v20200620
CMIP6.CMIP.CAS.FGOALS-f3-L.1pctCO2.r2i1p1f1.Amon.clw.gr.v20200620
CMIP6.CMIP.CAS.FGOALS-f3-L.1pctCO2.r2i1p1f1.Amon.clwvi.gr.v20200620
CMIP6.CMIP.CAS.FGOALS-f3-L.1pctCO2.r3i1p1f1.Amon.clw.gr.v20200620
CMIP6.CMIP.CAS.FGOALS-f3-L.1pctCO2.r3i1p1f1.Amon.clwvi.gr.v20200620
CMIP6.CMIP.CNRM-CERFACS.CNRM-CM6-1.1pctCO2.r1i1p1f2.Amon.clw.gr.v20180626
CMIP6.CMIP.CNRM-CERFACS.CNRM-CM6-1.1pctCO2.r1i1p1f2.Amon.clwvi.gr.v20180626
CMIP6.CMIP.CNRM-CERFACS.CNRM-CM6-1-HR.1pctCO2.r1i1p1f2.Amon.clw.gr.v20191021
CMIP6.CMIP.CNRM-CERFACS.CNRM-CM6-1-HR.1pctCO2.r1i1p1f2.Amon.clwvi.gr.v20191021
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r10i1p1f2.Amon.clw.gr.v20200529
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r10i1p1f2.Amon.clwvi.gr.v20200529
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r1i1p1f2.Amon.clw.gr.v20181018
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r1i1p1f2.Amon.clwvi.gr.v20181018
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r2i1p1f2.Amon.clw.gr.v20181031
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r2i1p1f2.Amon.clwvi.gr.v20181031
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r3i1p1f2.Amon.clw.gr.v20181107
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r3i1p1f2.Amon.clwvi.gr.v20181107
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r4i1p1f2.Amon.clw.gr.v20190328
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r4i1p1f2.Amon.clwvi.gr.v20190328
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r5i1p1f2.Amon.clw.gr.v20200529
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r5i1p1f2.Amon.clwvi.gr.v20200529
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r6i1p1f2.Amon.clw.gr.v20200529
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r6i1p1f2.Amon.clwvi.gr.v20200529
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r7i1p1f2.Amon.clw.gr.v20200529
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r7i1p1f2.Amon.clwvi.gr.v20200529
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r8i1p1f2.Amon.clw.gr.v20200529
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r8i1p1f2.Amon.clwvi.gr.v20200529
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r9i1p1f2.Amon.clw.gr.v20200529
CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.1pctCO2.r9i1p1f2.Amon.clwvi.gr.v20200529
CMIP6.CMIP.E3SM-Project.E3SM-1-0.1pctCO2.r1i1p1f1.Amon.clw.gr.v20190718
CMIP6.CMIP.E3SM-Project.E3SM-1-0.1pctCO2.r1i1p1f1.Amon.clwvi.gr.v20190718
CMIP6.CMIP.EC-Earth-Consortium.EC-Earth3.1pctCO2.r3i1p1f1.Amon.clwvi.gr.v20200727
CMIP6.CMIP.EC-Earth-Consortium.EC-Earth3-Veg.1pctCO2.r1i1p1f1.Amon.clwvi.gr.v20200325
CMIP6.CMIP.IPSL.IPSL-CM6A-LR.1pctCO2.r1i1p1f1.Amon.clw.gr.v20180727
CMIP6.CMIP.IPSL.IPSL-CM6A-LR.1pctCO2.r1i1p1f1.Amon.clwvi.gr.v20180727
CMIP6.CMIP.NIMS-KMA.KACE-1-0-G.1pctCO2.r1i1p1f1.Amon.clw.gr.v20190916
CMIP6.CMIP.NIMS-KMA.KACE-1-0-G.1pctCO2.r1i1p1f1.Amon.clwvi.gr.v20190916
CMIP6.CMIP.THU.CIESM.1pctCO2.r1i1p1f1.Amon.clw.gr.v20200417
CMIP6.CMIP.THU.CIESM.1pctCO2.r1i1p1f1.Amon.clwvi.gr.v20200417

Finished writing file: CMIP6_pxp581_20200924T094632.txt
Do you want to proceed with request for missing files? (N/Y)
 No is default
Your request has been saved in 
 /home/581/pxp581/clef/docs/CMIP6_pxp581_20200924T094632.txt
You can use this file to request the data via the NCI helpdesk: help@nci.org.au  or https://help.nci.org.au.

We run the same query which gave us as a result 4 missing datasets but this time we used the --request option after clef.
The tool will execute the query remotely, then look for matches locally and on the NCI download list. Having found none gives as an option of putting in a request.
It will accept any of the following as a positive answer:

Y YES y yes

With anything else or if you don’t pass anything it will assume you don’t want to put in a request.
It still saved the request in a file we can use later.

!head -n 4 CMIP6_*.txt
dataset_id=CMIP6.CMIP.CAS.FGOALS-f3-L.1pctCO2.r1i1p1f1.Amon.clw.gr.v20200620
dataset_id=CMIP6.CMIP.CAS.FGOALS-f3-L.1pctCO2.r1i1p1f1.Amon.clwvi.gr.v20200620
dataset_id=CMIP6.CMIP.CAS.FGOALS-f3-L.1pctCO2.r2i1p1f1.Amon.clw.gr.v20200620
dataset_id=CMIP6.CMIP.CAS.FGOALS-f3-L.1pctCO2.r2i1p1f1.Amon.clwvi.gr.v20200620

If I answered yes the tool would have sent an e-mail to the NCI helpdesk with the text file attached, NCI can pass that file as input to their download tool and queue your request.
NB if you are running clef from gadi you cannot send an e-mail so in that case the tool will skip the question and just remind you to send an e-mail to the NCI helpdesk yourself to finalise the request.