Basic CDMS Tutorial

The CDAT software was developed by LLNL. This tutorial was written by Charles Doutriaux. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

Download the Jupyter Notebook

Installing cdms2

Back to Top

In [1]:
!conda create -n cdms -y -c cdat/label/nightly -c conda-forge cdms2 libnetcdf==4.6.2
!conda activate cdms
Collecting package metadata: done
Solving environment: done

## Package Plan ##

  environment location: /software/anaconda53/envs/cdms

  added / updated specs:
    - cdms2
    - libnetcdf==4.6.2


The following NEW packages will be INSTALLED:

  asn1crypto         conda-forge/linux-64::asn1crypto-0.24.0-py36_1003
  attrs              conda-forge/noarch::attrs-19.1.0-py_0
  bzip2              conda-forge/linux-64::bzip2-1.0.6-h14c3975_1002
  ca-certificates    conda-forge/linux-64::ca-certificates-2019.6.16-hecc5488_0
  cdat_info          cdat/label/nightly/noarch::cdat_info-8.1.1-py_0
  cdms2              cdat/label/nightly/linux-64::cdms2-3.1.2.2019.05.29.21.01.ga9231c1-py36h481b005_0
  cdtime             cdat/label/nightly/linux-64::cdtime-3.1.2.2019.02.09.12.03.gba9e0cb-py36ha5dfbcb_0
  certifi            conda-forge/linux-64::certifi-2019.3.9-py36_0
  cffi               conda-forge/linux-64::cffi-1.12.3-py36h8022711_0
  chardet            conda-forge/linux-64::chardet-3.0.4-py36_1003
  cryptography       conda-forge/linux-64::cryptography-2.7-py36h72c5cf5_0
  curl               conda-forge/linux-64::curl-7.64.1-hf8cf82a_0
  decorator          conda-forge/noarch::decorator-4.4.0-py_0
  distarray          conda-forge/noarch::distarray-2.12.2-py_1
  esmf               conda-forge/linux-64::esmf-7.1.0-h9a7cb89_1006
  esmpy              conda-forge/linux-64::esmpy-7.1.0-py36h24bf2e0_3
  future             conda-forge/linux-64::future-0.17.1-py36_1000
  g2clib             conda-forge/linux-64::g2clib-1.6.0-hf3f1b0b_9
  hdf4               conda-forge/linux-64::hdf4-4.2.13-h9a582f1_1002
  hdf5               conda-forge/linux-64::hdf5-1.10.5-nompi_h3c11f04_1100
  idna               conda-forge/linux-64::idna-2.8-py36_1000
  ipython_genutils   conda-forge/noarch::ipython_genutils-0.2.0-py_1
  jasper             conda-forge/linux-64::jasper-1.900.1-h07fcdf6_1006
  jpeg               conda-forge/linux-64::jpeg-9c-h14c3975_1001
  jsonschema         conda-forge/linux-64::jsonschema-3.0.1-py36_0
  jupyter_core       conda-forge/noarch::jupyter_core-4.4.0-py_0
  krb5               conda-forge/linux-64::krb5-1.16.3-h05b26f9_1001
  lazy-object-proxy  conda-forge/linux-64::lazy-object-proxy-1.4.1-py36h516909a_0
  libblas            conda-forge/linux-64::libblas-3.8.0-7_openblas
  libcblas           conda-forge/linux-64::libcblas-3.8.0-7_openblas
  libcdms            cdat/label/nightly/linux-64::libcdms-3.1.2.2019.02.09.03.57.g8f1cefe-h9ac9557_0
  libcf              cdat/label/nightly/linux-64::libcf-3.1.0.2019.04.11.18.06.g2757b00-py36h14c3975_0
  libcurl            conda-forge/linux-64::libcurl-7.64.1-hda55be3_0
  libdrs             cdat/label/nightly/linux-64::libdrs-3.1.2.2019.02.09.01.20.g0c04b0c-h9ac9557_0
  libdrs_f           cdat/label/nightly/linux-64::libdrs_f-3.1.2.2019.02.09.01.20.g0c04b0c-h9ac9557_0
  libedit            conda-forge/linux-64::libedit-3.1.20170329-hf8c457e_1001
  libffi             conda-forge/linux-64::libffi-3.2.1-he1b5a44_1006
  libgcc-ng          pkgs/main/linux-64::libgcc-ng-9.1.0-hdf63c60_0
  libgfortran-ng     pkgs/main/linux-64::libgfortran-ng-7.3.0-hdf63c60_0
  liblapack          conda-forge/linux-64::liblapack-3.8.0-7_openblas
  libnetcdf          conda-forge/linux-64::libnetcdf-4.6.2-h056eaf5_1002
  libpng             conda-forge/linux-64::libpng-1.6.37-hed695b0_0
  libssh2            conda-forge/linux-64::libssh2-1.8.2-h22169c7_2
  libstdcxx-ng       pkgs/main/linux-64::libstdcxx-ng-9.1.0-hdf63c60_0
  libtiff            conda-forge/linux-64::libtiff-4.0.10-h57b8799_1003
  libuuid            conda-forge/linux-64::libuuid-2.32.1-h14c3975_1000
  lz4-c              conda-forge/linux-64::lz4-c-1.8.3-he1b5a44_1001
  mpi                conda-forge/linux-64::mpi-1.0-mpich
  mpich              conda-forge/linux-64::mpich-3.2.1-hc99cbb1_1011
  nbformat           conda-forge/noarch::nbformat-4.4.0-py_1
  ncurses            conda-forge/linux-64::ncurses-6.1-hf484d3e_1002
  netcdf-fortran     conda-forge/linux-64::netcdf-fortran-4.4.5-hfc18c51_1001
  numpy              conda-forge/linux-64::numpy-1.16.4-py36h95a1406_0
  openblas           conda-forge/linux-64::openblas-0.3.5-h9ac9557_1001
  openssl            conda-forge/linux-64::openssl-1.1.1b-h14c3975_1
  pip                conda-forge/linux-64::pip-19.1.1-py36_0
  pycparser          conda-forge/linux-64::pycparser-2.19-py36_1
  pyopenssl          conda-forge/linux-64::pyopenssl-19.0.0-py36_0
  pyrsistent         conda-forge/linux-64::pyrsistent-0.15.2-py36h516909a_0
  pysocks            conda-forge/linux-64::pysocks-1.7.0-py36_0
  python             conda-forge/linux-64::python-3.6.7-h381d211_1004
  readline           conda-forge/linux-64::readline-7.0-hf8c457e_1001
  requests           conda-forge/linux-64::requests-2.22.0-py36_0
  setuptools         conda-forge/linux-64::setuptools-41.0.1-py36_0
  six                conda-forge/linux-64::six-1.12.0-py36_1000
  sqlite             conda-forge/linux-64::sqlite-3.28.0-h8b20d00_0
  tk                 conda-forge/linux-64::tk-8.6.9-hed695b0_1002
  traitlets          conda-forge/linux-64::traitlets-4.3.2-py36_1000
  urllib3            conda-forge/linux-64::urllib3-1.24.3-py36_0
  wheel              conda-forge/linux-64::wheel-0.33.4-py36_0
  xz                 conda-forge/linux-64::xz-5.2.4-h14c3975_1001
  zlib               conda-forge/linux-64::zlib-1.2.11-h14c3975_1004
  zstd               conda-forge/linux-64::zstd-1.4.0-h3b9ef0a_0


Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate cdms
#
# To deactivate an active environment, use
#
#     $ conda deactivate


CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run

    $ conda init <SHELL_NAME>

Currently supported shells are:
  - bash
  - fish
  - tcsh
  - xonsh
  - zsh
  - powershell

See 'conda init --help' for more information and options.

IMPORTANT: You may need to close and restart your shell after running 'conda init'.


Preparing your environment

Back to Top

In [2]:
from __future__ import print_function
import cdat_info
import os, sys

data_path = cdat_info.get_sampledata_path()

version="python"+sys.version[0:3]
cdat_info.download_sample_data_files(os.path.join(sys.prefix,"lib",version,"site-packages","share","cdms2","test_data_files.txt"),data_path)
MD5: /software/anaconda53/envs/cdms2/lib/python3.7/site-packages/share/cdms2/test_data_files.txt

Opening and querying a file for reading

Back to Top

In [3]:
# Open a sample file
import cdms2

filename = os.path.join(data_path,"clt.nc")
f = cdms2.open(filename)
In [4]:
# Query variables in the file
var = f.listvariable()
print("variables in the file:",var)
variables in the file: ['clt', 'u', 'v']
In [5]:
# Query dimensions in the file
dims = f.listdimension()
print("Dimensions in the file:",dims)
Dimensions in the file: ['latitude', 'latitude1', 'latitude2', 'longitude', 'longitude1', 'longitude2', 'plev', 'plev1', 'time', 'time1', 'time2']
In [6]:
# Query file attributes
attr = f.listglobal()
print("File attributes:",attr)
File attributes: ['Conventions', 'comments', 'model', 'center']

Querying Variables (in file)

Back to Top

You can further query the variables in the file without having to read them in memory

To create a file variable simply use square bracket: [ and ]

In [7]:
clt = f["clt"]  # This is a file variable, not in memory
In [8]:
# Print variable info to screen
clt.info()
*** Description of Slab clt ***
id: clt
shape: (120, 46, 72)
filename: /software/anaconda53/envs/cdms2/share/cdat/sample_data/clt.nc
missing_value: 1e+20
comments: YONU_AMIP1
grid_name: YONU4X5
grid_type: gaussian
time_statistic: average
long_name: Total cloudiness
units: %
Grid has Python id 0x7fe09a708080.
Gridtype: gaussian
Grid shape: (46, 72)
Order: yx
** Dimension 1 **
   id: time
   Designated a time axis.
   units:  months since 1979-1-1 0
   Length: 120
   First:  0.0
   Last:   119.0
   Python id:  0x7fe09a6f9f98
** Dimension 2 **
   id: latitude
   Designated a latitude axis.
   units:  degrees_north
   Length: 46
   First:  -90.0
   Last:   90.0
   Other axis attributes:
      long_name: Latitude
   Python id:  0x7fe09a6f9dd8
** Dimension 3 **
   id: longitude
   Designated a longitude axis.
   units:  degrees_east
   Length: 72
   First:  -180.0
   Last:   175.0
   Other axis attributes:
      long_name: Longitude
   Python id:  0x7fe09a6f9e80
*** End of description for clt ***
In [9]:
# Variable shape
sh = clt.shape
print("The variable shape is:",sh)
The variable shape is: (120, 46, 72)
In [10]:
# Variable id
name = clt.id
print("Variable id/name:",name)
Variable id/name: clt
In [11]:
# The variable dimensions
axes = clt.getAxisList()
print("variable dimensions:",axes)
variable dimensions: [   id: time
   Designated a time axis.
   units:  months since 1979-1-1 0
   Length: 120
   First:  0.0
   Last:   119.0
   Python id:  0x7fe09a6f9f98
,    id: latitude
   Designated a latitude axis.
   units:  degrees_north
   Length: 46
   First:  -90.0
   Last:   90.0
   Other axis attributes:
      long_name: Latitude
   Python id:  0x7fe09a6f9dd8
,    id: longitude
   Designated a longitude axis.
   units:  degrees_east
   Length: 72
   First:  -180.0
   Last:   175.0
   Other axis attributes:
      long_name: Longitude
   Python id:  0x7fe09a6f9e80
]
In [12]:
# Variable attributes
attributes = clt.attributes
print("Variable attributes:",attributes.keys())
Variable attributes: dict_keys(['missing_value', 'comments', 'long_name', 'units', 'grid_name', 'grid_type', 'time_statistic'])

Dimensions

Back to Top

In [13]:
# Determine if an axis is time
for a in axes:
    if a.isTime():
        print("Axes %s is a time axis" % a.id)
    else:
        print("Axes %s is not a time axis" % a.id)
Axes time is a time axis
Axes latitude is not a time axis
Axes longitude is not a time axis
In [14]:
# Similar functions exist for level, latitude and longitude
for a in axes:
    print(a.isLatitude(), a.isLongitude(), a.isLevel())
False False False
True False False
False True False
In [15]:
# Similarly we can get one of these 4 types of dimension automatically
aTime = clt.getTime()
lat = clt.getLatitude()
lon = clt.getLongitude()
In [16]:
# if such dimension does not exists None is returned
lev = clt.getLevel()
print("Level dim:",lev)
Level dim: None
In [17]:
# Any dimension can also by retrieved by its index
dim0 = clt.getAxis(0)
print("The first dim name is:",dim0.id)
The first dim name is: time
In [18]:
# Dimension information
dim0.info()
   id: time
   Designated a time axis.
   units:  months since 1979-1-1 0
   Length: 120
   First:  0.0
   Last:   119.0
   Python id:  0x7fe09a6f9f98
In [19]:
# Accessing axis values
print("Latitude values:",clt.getLatitude()[:])
Latitude values: [-90. -86. -82. -78. -74. -70. -66. -62. -58. -54. -50. -46. -42. -38.
 -34. -30. -26. -22. -18. -14. -10.  -6.  -2.   2.   6.  10.  14.  18.
  22.  26.  30.  34.  38.  42.  46.  50.  54.  58.  62.  66.  70.  74.
  78.  82.  86.  90.]

Time dimensions

cdms is really good at dealing with times (see decdicated cdtime jupyter notebook for more on time)

Back to Top

In [20]:
# Rather than raw (in file) values or indices it can be usefull to show/manipulate time 
# as 'component time'
tim = clt.getTime()
tc = tim.asComponentTime()
print("First 2 times are:",tc[:2])
# or 'relative times'
tr = tim.asRelativeTime("days since 2017")
print("first 2 times in days since 2017:", tr[:2])
First 2 times are: [1979-1-1 0:0:0.0, 1979-2-1 0:0:0.0]
first 2 times in days since 2017: [-13880.000000 days since 2017, -13849.000000 days since 2017]

Retrieving data

Back to Top

In [21]:
# Whole
clt =f("clt") # parentheis means read in memory
print("Shape:",clt.shape)
Shape: (120, 46, 72)
In [22]:
# Partial, based on values in file
clt = f("clt",latitude=(0,90),longitude=(-180,180))
print("Shape:",clt.shape)
Shape: (120, 23, 73)
In [23]:
# Based on indices
clt = f("clt",time=slice(0,12))
print("Shape:",clt.shape)
Shape: (12, 46, 72)
In [24]:
# time can be retirieved based on actual dates (provided units are good in file)
clt = f("clt",time=("1980","1983-12-31"))
print("Shape:",clt.shape)
Shape: (48, 46, 72)
In [25]:
# Data can also be read directly from a file variable
CLT = f["clt"]
clt = CLT(time=("1980","1984-12-31"),latitude=(0,90),longitude=slice(0,None))
print("Shape:",clt.shape)
Shape: (60, 23, 72)
In [26]:
# Or from an exisitng variavle
clt2 = clt(time=slice(0,4))
print("Shape:",clt2.shape)
Shape: (4, 23, 72)
/software/anaconda53/envs/cdms2/lib/python3.7/site-packages/numpy/ma/core.py:3174: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  dout = self.data[indx]
In [27]:
# data can also be reordered based on dimensions
clt = f("clt",order="xty")
print("Shape:",clt.shape)
Shape: (72, 120, 46)
In [28]:
# or use dimension indices
clt=f("clt", order="210")
print("Shape:",clt.shape)
Shape: (72, 46, 120)
In [29]:
# or use dimension names
clt = f("clt",order="(longitude)(time)(latitude)")
print("Shape:",clt.shape)
Shape: (72, 120, 46)

Manipulating Data

Back to Top

cdms variables are subclass of numpy, so for the most part anything you can do with numpy can be done with cdms variables

In [30]:
# Extract same month every years (from monthly data)
clt=f("clt")
subset = clt[::12]
print("Shape:",subset.shape)
Shape: (10, 46, 72)
In [31]:
# cdms variable can be converted to raw numpy
nparray = clt.filled()
print(type(clt),type(nparray))
<class 'cdms2.tvariable.TransientVariable'> <class 'numpy.ndarray'>
In [32]:
# or masked arrays
maarray = clt.asma()
print(type(clt),type(maarray))
<class 'cdms2.tvariable.TransientVariable'> <class 'numpy.ma.core.MaskedArray'>

Creating MV2 and storing them in files

Back to Top

In [33]:
import MV2
# Create a cdms variable from a numpy (or numpy.ma) array
myvar = MV2.array(nparray)
myvar.id = "newclt"
myvar.info()
*** Description of Slab newclt ***
id: newclt
shape: (120, 46, 72)
filename: 
missing_value: 1e+20
comments: 
grid_name: N/A
grid_type: N/A
time_statistic: 
long_name: 
units: 
tileIndex: None
No grid present.
** Dimension 1 **
   id: axis_0
   Length: 120
   First:  0.0
   Last:   119.0
   Python id:  0x7fe09a71c3c8
** Dimension 2 **
   id: axis_1
   Length: 46
   First:  0.0
   Last:   45.0
   Python id:  0x7fe09a71c518
** Dimension 3 **
   id: axis_2
   Length: 72
   First:  0.0
   Last:   71.0
   Python id:  0x7fe09a71c320
*** End of description for newclt ***
In [34]:
# We can . add axes from other variables
myvar.setAxisList(clt.getAxisList())
myvar.info()
*** Description of Slab newclt ***
id: newclt
shape: (120, 46, 72)
filename: 
missing_value: 1e+20
comments: 
grid_name: <None>
grid_type: generic
time_statistic: 
long_name: 
units: 
tileIndex: None
Grid has Python id 0x7fe09a71c3c8.
Gridtype: generic
Grid shape: (46, 72)
Order: yx
** Dimension 1 **
   id: time
   Designated a time axis.
   units:  months since 1979-1-1 0
   Length: 120
   First:  0.0
   Last:   119.0
   Other axis attributes:
      axis: T
      calendar: gregorian
      realtopology: linear
   Python id:  0x7fe09a58e278
** Dimension 2 **
   id: latitude
   Designated a latitude axis.
   units:  degrees_north
   Length: 46
   First:  -90.0
   Last:   90.0
   Other axis attributes:
      axis: Y
      long_name: Latitude
      realtopology: linear
   Python id:  0x7fe09a58e240
** Dimension 3 **
   id: longitude
   Designated a longitude axis.
   units:  degrees_east
   Length: 72
   First:  -180.0
   Last:   175.0
   Other axis attributes:
      axis: X
      modulo: 360.0
      topology: circular
      long_name: Longitude
      realtopology: circular
   Python id:  0x7fe09a58e2b0
*** End of description for newclt ***
In [35]:
# we can also add axes one at a time
for i in range(myvar.ndim):
    ax = clt.getAxis(i)
    print("Setting axis %i to %s" % (i,ax.id))
    myvar.setAxis(i,ax)
myvar.info()
Setting axis 0 to time
Setting axis 1 to latitude
Setting axis 2 to longitude
*** Description of Slab newclt ***
id: newclt
shape: (120, 46, 72)
filename: 
missing_value: 1e+20
comments: 
grid_name: <None>
grid_type: generic
time_statistic: 
long_name: 
units: 
tileIndex: None
Grid has Python id 0x7fe09a71c3c8.
Gridtype: generic
Grid shape: (46, 72)
Order: yx
** Dimension 1 **
   id: time
   Designated a time axis.
   units:  months since 1979-1-1 0
   Length: 120
   First:  0.0
   Last:   119.0
   Other axis attributes:
      axis: T
      calendar: gregorian
      realtopology: linear
   Python id:  0x7fe09a58e278
** Dimension 2 **
   id: latitude
   Designated a latitude axis.
   units:  degrees_north
   Length: 46
   First:  -90.0
   Last:   90.0
   Other axis attributes:
      axis: Y
      long_name: Latitude
      realtopology: linear
   Python id:  0x7fe09a58e240
** Dimension 3 **
   id: longitude
   Designated a longitude axis.
   units:  degrees_east
   Length: 72
   First:  -180.0
   Last:   175.0
   Other axis attributes:
      axis: X
      modulo: 360.0
      topology: circular
      long_name: Longitude
      realtopology: circular
   Python id:  0x7fe09a58e2b0
*** End of description for newclt ***
In [36]:
# We can also create axes manually
newtime = cdms2.createAxis(range(120))
newtime.id = "time" # name of dimension
newtime.designateTime()  # tell cdms to add attributes that make it time
newtime.units = "months since 2017"
myvar.setAxis(0,newtime)
myvar.info()  # Notice tikme changed
*** Description of Slab newclt ***
id: newclt
shape: (120, 46, 72)
filename: 
missing_value: 1e+20
comments: 
grid_name: <None>
grid_type: generic
time_statistic: 
long_name: 
units: 
tileIndex: None
Grid has Python id 0x7fe09a71c3c8.
Gridtype: generic
Grid shape: (46, 72)
Order: yx
** Dimension 1 **
   id: time
   Designated a time axis.
   units:  months since 2017
   Length: 120
   First:  0
   Last:   119
   Other axis attributes:
      axis: T
      calendar: gregorian
   Python id:  0x7fe09a64dcf8
** Dimension 2 **
   id: latitude
   Designated a latitude axis.
   units:  degrees_north
   Length: 46
   First:  -90.0
   Last:   90.0
   Other axis attributes:
      axis: Y
      long_name: Latitude
      realtopology: linear
   Python id:  0x7fe09a58e240
** Dimension 3 **
   id: longitude
   Designated a longitude axis.
   units:  degrees_east
   Length: 72
   First:  -180.0
   Last:   175.0
   Other axis attributes:
      axis: X
      modulo: 360.0
      topology: circular
      long_name: Longitude
      realtopology: circular
   Python id:  0x7fe09a58e2b0
*** End of description for newclt ***

Saving data

Back to Top

In [37]:
 # By default cdms2 will save files in NetCDF4 compressed with no shuffle by defalted at level 1
print("Default Shuffle:",cdms2.getNetcdfShuffleFlag())
print("Default Deflate:",cdms2.getNetcdfDeflateFlag())
print("Default Deflate Level:",cdms2.getNetcdfDeflateLevelFlag())
Default Shuffle: 0
Default Deflate: 1
Default Deflate Level: 1
In [38]:
# Let's turn it all off so we get NetCDF3 classic files
value = 0
cdms2.setNetcdfShuffleFlag(value) ## where value is either 0 or 1
cdms2.setNetcdfDeflateFlag(value) ## where value is either 0 or 1
cdms2.setNetcdfDeflateLevelFlag(value) ## where value is a integer between 0 and 9 included
print("Shuffle:",cdms2.getNetcdfShuffleFlag())
print("Deflate:",cdms2.getNetcdfDeflateFlag())
print("Deflate Level:",cdms2.getNetcdfDeflateLevelFlag())
Shuffle: 0
Deflate: 0
Deflate Level: 0
In [39]:
# Let's open a file for writing
f2 = cdms2.open("mydata.nc","w") # "w" means open file for writing and erase if already here
f2.write(myvar)
f2.close()