{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Basic CDMS Tutorial<a id='top' class=\"tocSkip\"> </a>\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "The CDAT software was developed by LLNL. This tutorial was written by Charles Doutriaux. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.\n",
    "\n",
    "\n",
    "\n",
    "[Download the Jupyter Notebook](cdms2_101.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "toc": true
   },
   "source": [
    "<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n",
    "<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#Installing-cdms2\" data-toc-modified-id=\"Installing-cdms2-1\">Installing cdms2</a></span></li><li><span><a href=\"#Preparing-your-environment\" data-toc-modified-id=\"Preparing-your-environment-2\">Preparing your environment</a></span></li><li><span><a href=\"#Opening-and-querying-a-file-for-reading\" data-toc-modified-id=\"Opening-and-querying-a-file-for-reading-3\">Opening and querying a file for reading</a></span></li><li><span><a href=\"#Querying-Variables-(in-file)\" data-toc-modified-id=\"Querying-Variables-(in-file)-4\">Querying Variables (in file)</a></span></li><li><span><a href=\"#Dimensions\" data-toc-modified-id=\"Dimensions-5\">Dimensions</a></span></li><li><span><a href=\"#Time-dimensions\" data-toc-modified-id=\"Time-dimensions-6\">Time dimensions</a></span></li><li><span><a href=\"#Retrieving-data\" data-toc-modified-id=\"Retrieving-data-7\">Retrieving data</a></span></li><li><span><a href=\"#Manipulating-Data\" data-toc-modified-id=\"Manipulating-Data-8\">Manipulating Data</a></span></li><li><span><a href=\"#Creating-MV2-and-storing-them-in-files\" data-toc-modified-id=\"Creating-MV2-and-storing-them-in-files-9\">Creating MV2 and storing them in files</a></span></li><li><span><a href=\"#Saving-data\" data-toc-modified-id=\"Saving-data-10\">Saving data</a></span></li></ul></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Installing cdms2\n",
    "[Back to Top](#top)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!conda create -n cdms -y -c cdat/label/nightly -c conda-forge cdms2 libnetcdf==4.6.2\n",
    "!conda activate cdms\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preparing your environment\n",
    "[Back to Top](#top)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from __future__ import print_function\n",
    "import cdat_info\n",
    "import os, sys\n",
    "\n",
    "data_path = cdat_info.get_sampledata_path()\n",
    "\n",
    "version=\"python\"+sys.version[0:3]\n",
    "cdat_info.download_sample_data_files(os.path.join(sys.prefix,\"lib\",version,\"site-packages\",\"share\",\"cdms2\",\"test_data_files.txt\"),data_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Opening and querying a file for reading\n",
    "[Back to Top](#top)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Open a sample file\n",
    "import cdms2\n",
    "\n",
    "filename = os.path.join(data_path,\"clt.nc\")\n",
    "f = cdms2.open(filename)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Query variables in the file\n",
    "var = f.listvariable()\n",
    "print(\"variables in the file:\",var)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Query dimensions in the file\n",
    "dims = f.listdimension()\n",
    "print(\"Dimensions in the file:\",dims)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Query file attributes\n",
    "attr = f.listglobal()\n",
    "print(\"File attributes:\",attr)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Querying Variables (in file)\n",
    "[Back to Top](#top)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can further query the variables in the file without having to read them in memory\n",
    "\n",
    "To create a `file variable` simply use square bracket: **[** and **]**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "clt = f[\"clt\"]  # This is a file variable, not in memory"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Print variable info to screen\n",
    "clt.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Variable shape\n",
    "sh = clt.shape\n",
    "print(\"The variable shape is:\",sh)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Variable id\n",
    "name = clt.id\n",
    "print(\"Variable id/name:\",name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The variable dimensions\n",
    "axes = clt.getAxisList()\n",
    "print(\"variable dimensions:\",axes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Variable attributes\n",
    "attributes = clt.attributes\n",
    "print(\"Variable attributes:\",attributes.keys())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Dimensions\n",
    "[Back to Top](#top)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Determine if an axis is time\n",
    "for a in axes:\n",
    "    if a.isTime():\n",
    "        print(\"Axes %s is a time axis\" % a.id)\n",
    "    else:\n",
    "        print(\"Axes %s is not a time axis\" % a.id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Similar functions exist for level, latitude and longitude\n",
    "for a in axes:\n",
    "    print(a.isLatitude(), a.isLongitude(), a.isLevel())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Similarly we can get one of these 4 types of dimension automatically\n",
    "aTime = clt.getTime()\n",
    "lat = clt.getLatitude()\n",
    "lon = clt.getLongitude()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# if such dimension does not exists None is returned\n",
    "lev = clt.getLevel()\n",
    "print(\"Level dim:\",lev)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Any dimension can also by retrieved by its index\n",
    "dim0 = clt.getAxis(0)\n",
    "print(\"The first dim name is:\",dim0.id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Dimension information\n",
    "dim0.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Accessing axis values\n",
    "print(\"Latitude values:\",clt.getLatitude()[:])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Time dimensions\n",
    "\n",
    "cdms is really good at dealing with times (see decdicated cdtime jupyter notebook for more on time)\n",
    "\n",
    "[Back to Top](#top)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Rather than raw (in file) values or indices it can be usefull to show/manipulate time \n",
    "# as 'component time'\n",
    "tim = clt.getTime()\n",
    "tc = tim.asComponentTime()\n",
    "print(\"First 2 times are:\",tc[:2])\n",
    "# or 'relative times'\n",
    "tr = tim.asRelativeTime(\"days since 2017\")\n",
    "print(\"first 2 times in days since 2017:\", tr[:2])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Retrieving data\n",
    "[Back to Top](#top)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Whole\n",
    "clt =f(\"clt\") # parentheis means read in memory\n",
    "print(\"Shape:\",clt.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Partial, based on values in file\n",
    "clt = f(\"clt\",latitude=(0,90),longitude=(-180,180))\n",
    "print(\"Shape:\",clt.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Based on indices\n",
    "clt = f(\"clt\",time=slice(0,12))\n",
    "print(\"Shape:\",clt.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# time can be retirieved based on actual dates (provided units are good in file)\n",
    "clt = f(\"clt\",time=(\"1980\",\"1983-12-31\"))\n",
    "print(\"Shape:\",clt.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Data can also be read directly from a file variable\n",
    "CLT = f[\"clt\"]\n",
    "clt = CLT(time=(\"1980\",\"1984-12-31\"),latitude=(0,90),longitude=slice(0,None))\n",
    "print(\"Shape:\",clt.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Or from an exisitng variavle\n",
    "clt2 = clt(time=slice(0,4))\n",
    "print(\"Shape:\",clt2.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# data can also be reordered based on dimensions\n",
    "clt = f(\"clt\",order=\"xty\")\n",
    "print(\"Shape:\",clt.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# or use dimension indices\n",
    "clt=f(\"clt\", order=\"210\")\n",
    "print(\"Shape:\",clt.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# or use dimension names\n",
    "clt = f(\"clt\",order=\"(longitude)(time)(latitude)\")\n",
    "print(\"Shape:\",clt.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Manipulating Data\n",
    "[Back to Top](#top)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "cdms variables are subclass of numpy, so for the most part anything you can do with numpy\n",
    "can be done with cdms variables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Extract same month every years (from monthly data)\n",
    "clt=f(\"clt\")\n",
    "subset = clt[::12]\n",
    "print(\"Shape:\",subset.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# cdms variable can be converted to raw numpy\n",
    "nparray = clt.filled()\n",
    "print(type(clt),type(nparray))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# or masked arrays\n",
    "maarray = clt.asma()\n",
    "print(type(clt),type(maarray))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Creating MV2 and storing them in files\n",
    "[Back to Top](#top)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import MV2\n",
    "# Create a cdms variable from a numpy (or numpy.ma) array\n",
    "myvar = MV2.array(nparray)\n",
    "myvar.id = \"newclt\"\n",
    "myvar.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We can . add axes from other variables\n",
    "myvar.setAxisList(clt.getAxisList())\n",
    "myvar.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# we can also add axes one at a time\n",
    "for i in range(myvar.ndim):\n",
    "    ax = clt.getAxis(i)\n",
    "    print(\"Setting axis %i to %s\" % (i,ax.id))\n",
    "    myvar.setAxis(i,ax)\n",
    "myvar.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We can also create axes manually\n",
    "newtime = cdms2.createAxis(range(120))\n",
    "newtime.id = \"time\" # name of dimension\n",
    "newtime.designateTime()  # tell cdms to add attributes that make it time\n",
    "newtime.units = \"months since 2017\"\n",
    "myvar.setAxis(0,newtime)\n",
    "myvar.info()  # Notice tikme changed"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Saving data\n",
    "[Back to Top](#top)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    " # By default cdms2 will save files in NetCDF4 compressed with no shuffle by defalted at level 1\n",
    "print(\"Default Shuffle:\",cdms2.getNetcdfShuffleFlag())\n",
    "print(\"Default Deflate:\",cdms2.getNetcdfDeflateFlag())\n",
    "print(\"Default Deflate Level:\",cdms2.getNetcdfDeflateLevelFlag())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let's turn it all off so we get NetCDF3 classic files\n",
    "value = 0\n",
    "cdms2.setNetcdfShuffleFlag(value) ## where value is either 0 or 1\n",
    "cdms2.setNetcdfDeflateFlag(value) ## where value is either 0 or 1\n",
    "cdms2.setNetcdfDeflateLevelFlag(value) ## where value is a integer between 0 and 9 included\n",
    "print(\"Shuffle:\",cdms2.getNetcdfShuffleFlag())\n",
    "print(\"Deflate:\",cdms2.getNetcdfDeflateFlag())\n",
    "print(\"Deflate Level:\",cdms2.getNetcdfDeflateLevelFlag())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let's open a file for writing\n",
    "f2 = cdms2.open(\"mydata.nc\",\"w\") # \"w\" means open file for writing and erase if already here\n",
    "f2.write(myvar)\n",
    "f2.close()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": true,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}