{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Using NetCDF4 Compression with CDMS \n", "\n", "\n", "CDMS2 writes out data using the [NetCDF library](https://www.unidata.ucar.edu/software/netcdf/)\n", "\n", "NetCDF4 allows for file compression, a good blog about NetCDF4 and compression can be found [here](http://www.unidata.ucar.edu/blogs/developer/entry/netcdf_compression)\n", "\n", "From this blog:\n", "\n", "*\"The netCDF-4 libraries inherit the capability for data compression from the HDF5 storage layer underneath the netCDF-4 interface. Linking a program that uses netCDF to a netCDF-4 library allows the program to read compressed data without changing a single line of the program source code.\"*\n", "\n", "and\n", "\n", "*\"Also, we're only dealing with lossless compression\"*\n", "\n", "This Notebook shows how to control NetCDF4 compression (shuffling/deflating) capabilities via cdms2.\n", "\n", "The CDAT software was developed by LLNL. This tutorial was written by Charles Doutriaux. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.\n", "\n", "[Download the Jupyter Notebook](NetCDF4_Compression.ipynb)" ] }, { "cell_type": "markdown", "metadata": { "toc": true }, "source": [ "

Table of Contents

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Preparing The Notebook\n", "\n", "In order to look at a NetCDF content the easiest way is to use [ncdump](https://www.unidata.ucar.edu/software/netcdf/netcdf-4/newdocs/netcdf/ncdump.html). The following function helps us do a line call within Python, for Notebook clarity.\n", "\n", "We also prepare some random data\n", "\n", "[Back To Top](#top)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from __future__ import print_function\n", "import subprocess\n", "import shlex\n", "import numpy\n", "import os\n", "import io\n", "import time\n", "\n", "# Get file size\n", "def size_it(filename):\n", " statinfo = os.stat(filename)\n", " return statinfo.st_size\n", "\n", "# Write and return time\n", "def dump(data,filename=\"example.nc\"):\n", " start = time.time()\n", " f = cdms2.open(filename,\"w\")\n", " f.write(data,id=\"data\")\n", " f.close()\n", " return time.time()-start,size_it(filename)\n", "\n", "class HTML(object):\n", " def __init__(self,html):\n", " self.html = html\n", " def _repr_html_(self):\n", " return self.html\n", "\n", "\n", "# Nice html output for ncdump\n", "class NCINFO(object):\n", " def __init__(self, filename, variable=None, options=\"\"):\n", " self.filename = filename\n", " self.variable = variable\n", " self.options = options\n", " def _repr_html_(self):\n", " out = self.nc_info()\n", " lines = []\n", " for l in out.split(\"\\n\"):\n", " for kw in [\"chunk\",\"deflate\",\"classic\",\"netcdf4\",\"netcdf-4\"]:\n", " if l.lower().find(kw)>-1:\n", " l = \"{0}\".format(l)\n", " lines.append(l.replace(\"\\t\",\"  \"))\n", " return \"{0}\".format(\"
\".join(lines))\n", " def nc_info(self):\n", " \"\"\"calls ncdump on file\n", " Can opass a variable or optional ncdump arguments\n", " Default call `ncdump -hs filename`\"\"\"\n", " with io.BytesIO() as out:\n", " ncdumpOptions = \"-hs {options}\".format(options=self.options)\n", " if self.variable is not None:\n", " ncdumpOptions += \"-v {variable}\".format(self.variable)\n", " cmd = \"ncdump {options} {file}\".format(options=ncdumpOptions, file=self.filename)\n", " import pdb\n", " pdb.set_trace()\n", " out.write(bytes(\"Runnning {0}\".format(cmd).encode(\"utf8\")))\n", " cmd = shlex.split(cmd)\n", " p = subprocess.Popen(cmd,stdout=subprocess.PIPE,stderr=subprocess.PIPE)\n", " o, e = p.communicate()\n", " out.write(b'-------')\n", " out.write(o)\n", " out.write(b'-------')\n", " out.write(\"File Size {0} bytes\".format(size_it(self.filename)).encode(\"utf8\"))\n", " return out.getvalue()\n", " \n", "import requests\n", "def download(fnm):\n", " r = requests.get(\"https://uvcdat.llnl.gov/cdat/sample_data/%s\" % fnm,stream=True)\n", " with open(fnm,\"wb\") as f:\n", " for chunk in r.iter_content(chunk_size=1024):\n", " if chunk: # filter local_filename keep-alive new chunks\n", " f.write(chunk)\n", "\n", "download(\"clt.nc\")\n", "data = numpy.random.random((120,180,360))\n", "# Random data do not compress well at all, switching to 0/1\n", "data = numpy.greater(data,.5).astype(numpy.float)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Default Settings\n", "\n", "By default cdms writes out data in NetCDF4 ***classic*** with no ***shuffling*** and a ***deflate*** level of 1\n", "\n", "[Back To Top](#top)\n", "\n", "To access the netcdf value used to write data out use the following commands:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import cdms2\n", "print(\"NetCDF4? \",cdms2.getNetcdf4Flag())\n", "print(\"NetCDF Classic?\",cdms2.getNetcdfClassicFlag())\n", "print(\"NetCDF4 Shuffling\",cdms2.getNetcdfShuffleFlag())\n", "print(\"NetCDF4 Deflate?\",cdms2.getNetcdfDeflateFlag())\n", "print(\"NetCDF4 Deflate Level?\",cdms2.getNetcdfDeflateLevelFlag())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These values are read in at the time you **open** the file for writing\n", "\n", "Note the **BOLD** lines" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dump(data)\n", "NCINFO(\"example.nc\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Turning Off Compression\n", "\n", "[Back to Top](#top)\n", "\n", "We can use no compression by runnnig" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "value = 0\n", "cdms2.setNetcdfShuffleFlag(value) ## where value is either 0 or 1\n", "cdms2.setNetcdfDeflateFlag(value) ## where value is either 0 or 1\n", "cdms2.setNetcdfDeflateLevelFlag(value) ## where value is a integer between 0 and 9 included\n", "dump(data)\n", "NCINFO(\"example.nc\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Pure NetCDF3\n", "\n", "[Back To Top](#top)\n", "\n", "All these options can either be turned to 0 to enable NetCDF3 (as the warning above shows). One can also use the single command:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cdms2.useNetcdf3()\n", "# or for versions earlier than 2.12.2017.10.25\n", "value = 0\n", "cdms2.setNetcdfShuffleFlag(value) ## where value is either 0 or 1\n", "cdms2.setNetcdfDeflateFlag(value) ## where value is either 0 or 1\n", "cdms2.setNetcdfDeflateLevelFlag(value) ## where value is a integer between 0 and 9 included\n", "cdms2.setNetcdf4Flag(0)\n", "dump(data)\n", "NCINFO(\"example.nc\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# NetCDF4 non classic\n", "\n", "[Back To TOp](#top)\n", "\n", "We can also turn off the classic option for netcdf4" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cdms2.setNetcdf4Flag(1)\n", "cdms2.setNetcdfClassicFlag(0)\n", "dump(data)\n", "NCINFO(\"example.nc\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Using Shuffling\n", "\n", "[Back To Top](#top)\n", "\n", "We can turn on/off shuffling" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cdms2.setNetcdf4Flag(1)\n", "cdms2.setNetcdfClassicFlag(0)\n", "cdms2.setNetcdfShuffleFlag(1)\n", "dump(data)\n", "NCINFO(\"example.nc\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Controling Deflate Level\n", "\n", "[Back To top](#top)\n", "\n", "We can choose our deflate level (at the expense of time)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cdms2.setNetcdfShuffleFlag(0)\n", "cdms2.setNetcdfDeflateFlag(1)\n", "cdms2.setNetcdfDeflateLevelFlag(5)\n", "dump(data)\n", "NCINFO(\"example.nc\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Summarizing All Options\n", "\n", "[Back To Top](#top)\n", "\n", "Let's try with a real life example" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "f=cdms2.open(\"clt.nc\")\n", "clt = f(\"clt\")\n", "\n", "html = \"\"\n", "\n", "def addCell():\n", " t,s = dump(clt)\n", " return \"\".format(t,s)\n", "\n", "def nc4s():\n", " out = \"\"\n", " for classic in [1,0]:\n", " cdms2.setNetcdfClassicFlag(classic)\n", " for shuffle in [0,1]:\n", " cdms2.setNetcdfShuffleFlag(shuffle)\n", " out+=addCell()\n", " out+=\"\"\n", " return out\n", "\n", "# NetCDF3\n", "html+=\"\"\n", "cdms2.useNetcdf3()\n", "cdms2.setNetcdf4Flag(0)\n", "html+=addCell()\n", "cdms2.setNetcdf4Flag(1)\n", "html+=nc4s()\n", "cdms2.setNetcdfDeflateFlag(1)\n", "for i in range(1,10):\n", " cdms2.setNetcdfDeflateLevelFlag(i)\n", " html += \"\".format(i)\n", " html += nc4s()\n", "html+=\"
Deflate LevelNC3NC4 Classic no shuffleNC4 Classic shuffledNC4 no shuffleNC4 shuffled
{:.2f}/{:d}
0
{0}N/A
Time To Write NetCDF File and size for various NC4 settings
\"\n", "HTML(html)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }