Live Notebook
You can run this notebook in a live session or view it on Github.
[1]:
# To start, we install Rasterio, a Python module for interacting with gridded spatial data
!pip install rasterio
Collecting rasterio
Downloading rasterio-1.1.8-1-cp38-cp38-manylinux1_x86_64.whl (18.2 MB)
|████████████████████████████████| 18.2 MB 4.9 MB/s
Requirement already satisfied: numpy in /usr/share/miniconda3/envs/dask-examples/lib/python3.8/site-packages (from rasterio) (1.18.5)
Requirement already satisfied: attrs in /usr/share/miniconda3/envs/dask-examples/lib/python3.8/site-packages (from rasterio) (20.3.0)
Requirement already satisfied: click<8,>=4.0 in /usr/share/miniconda3/envs/dask-examples/lib/python3.8/site-packages (from rasterio) (7.1.2)
Collecting cligj>=0.5
Downloading cligj-0.7.1-py3-none-any.whl (7.1 kB)
Collecting snuggs>=1.4.1
Downloading snuggs-1.4.7-py3-none-any.whl (5.4 kB)
Requirement already satisfied: pyparsing>=2.1.6 in /usr/share/miniconda3/envs/dask-examples/lib/python3.8/site-packages (from snuggs>=1.4.1->rasterio) (2.4.7)
Collecting affine
Downloading affine-2.3.0-py2.py3-none-any.whl (15 kB)
Collecting click-plugins
Downloading click_plugins-1.1.1-py2.py3-none-any.whl (7.5 kB)
Installing collected packages: snuggs, cligj, click-plugins, affine, rasterio
Successfully installed affine-2.3.0 click-plugins-1.1.1 cligj-0.7.1 rasterio-1.1.8 snuggs-1.4.7
Reading and manipulating tiled GeoTIFF datasets¶
This notebook shows how to perform simple calculations with a GeoTIFF dataset using XArray and Dask. We load and rescale a Landsat 8 image and compute NDVI (Normalized difference vegetation index). This can be used to distinguish green vegetation from areas of bare land or water.
We’ll use an image of the Denver, USA area taken in July 2018.

Download data¶
First, we download the dataset. We are using an image from the cloud-hosted Landsat 8 public dataset and each band is available as a separate GeoTIFF file.
[2]:
import os
import json
import rasterio
import requests
import matplotlib.pyplot as plt
%matplotlib inline
[3]:
nir_filename = 'https://landsat-pds.s3.amazonaws.com/c1/L8/033/033/LC08_L1TP_033033_20180706_20180717_01_T1/LC08_L1TP_033033_20180706_20180717_01_T1_B5.TIF'
red_filename = 'https://landsat-pds.s3.amazonaws.com/c1/L8/033/033/LC08_L1TP_033033_20180706_20180717_01_T1/LC08_L1TP_033033_20180706_20180717_01_T1_B4.TIF'
mtl_filename = 'https://landsat-pds.s3.amazonaws.com/c1/L8/033/033/LC08_L1TP_033033_20180706_20180717_01_T1/LC08_L1TP_033033_20180706_20180717_01_T1_MTL.json'
[4]:
def download_file(in_filename, out_filename):
if not os.path.exists(out_filename):
print("Downloading", in_filename)
response = requests.get(in_filename)
with open(out_filename, 'wb') as f:
f.write(response.content)
[5]:
download_file(nir_filename, 'nir.tif')
download_file(red_filename, 'red.tif')
download_file(mtl_filename, 'meta.json')
Downloading https://landsat-pds.s3.amazonaws.com/c1/L8/033/033/LC08_L1TP_033033_20180706_20180717_01_T1/LC08_L1TP_033033_20180706_20180717_01_T1_B5.TIF
Downloading https://landsat-pds.s3.amazonaws.com/c1/L8/033/033/LC08_L1TP_033033_20180706_20180717_01_T1/LC08_L1TP_033033_20180706_20180717_01_T1_B4.TIF
Downloading https://landsat-pds.s3.amazonaws.com/c1/L8/033/033/LC08_L1TP_033033_20180706_20180717_01_T1/LC08_L1TP_033033_20180706_20180717_01_T1_MTL.json
Check image metadata¶
Let’s see if the image is tiled so we can select a chunk size.
[6]:
img = rasterio.open('red.tif')
print(img.is_tiled)
True
[7]:
img.block_shapes
[7]:
[(512, 512)]
The image has separate blocks for each band with block size 512 x 512.
Create XArray datasets¶
[8]:
import xarray as xr
red = xr.open_rasterio('red.tif', chunks={'band': 1, 'x': 1024, 'y': 1024})
nir = xr.open_rasterio('nir.tif', chunks={'band': 1, 'x': 1024, 'y': 1024})
nir
[8]:
<xarray.DataArray (band: 1, y: 7841, x: 7721)> dask.array<open_rasterio-e905e24cac8cf5d5222dacf60943f37e<this-array>, shape=(1, 7841, 7721), dtype=uint16, chunksize=(1, 1024, 1024), chunktype=numpy.ndarray> Coordinates: * band (band) int64 1 * y (y) float64 4.423e+06 4.423e+06 4.423e+06 ... 4.188e+06 4.188e+06 * x (x) float64 4.134e+05 4.134e+05 4.135e+05 ... 6.45e+05 6.45e+05 Attributes: transform: (30.0, 0.0, 413385.0, 0.0, -30.0, 4423215.0) crs: +init=epsg:32613 res: (30.0, 30.0) is_tiled: 1 nodatavals: (nan,) scales: (1.0,) offsets: (0.0,) AREA_OR_POINT: Point
- band: 1
- y: 7841
- x: 7721
- dask.array<chunksize=(1, 1024, 1024), meta=np.ndarray>
Array Chunk Bytes 121.08 MB 2.10 MB Shape (1, 7841, 7721) (1, 1024, 1024) Count 65 Tasks 64 Chunks Type uint16 numpy.ndarray - band(band)int641
array([1])
- y(y)float644.423e+06 4.423e+06 ... 4.188e+06
array([4423200., 4423170., 4423140., ..., 4188060., 4188030., 4188000.])
- x(x)float644.134e+05 4.134e+05 ... 6.45e+05
array([413400., 413430., 413460., ..., 644940., 644970., 645000.])
- transform :
- (30.0, 0.0, 413385.0, 0.0, -30.0, 4423215.0)
- crs :
- +init=epsg:32613
- res :
- (30.0, 30.0)
- is_tiled :
- 1
- nodatavals :
- (nan,)
- scales :
- (1.0,)
- offsets :
- (0.0,)
- AREA_OR_POINT :
- Point
Each dataset’s data is a Dask array.
[9]:
red.variable.data
[9]:
|
Optional: create a Dask client¶
You can start a Dask client to monitor execution with the dashboard.
[10]:
import dask
from dask.distributed import Client
client = Client(processes=False)
client
/usr/share/miniconda3/envs/dask-examples/lib/python3.8/site-packages/distributed/node.py:151: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43137 instead
warnings.warn(
[10]:
Client
|
Cluster
|
Rescale bands using Landsat metadata¶
The Landsat Level 1 images are delivered in a quantized format. This has to be converted to top-of-atmosphere reflectance using the provided metadata.
First we define convenience functions to load the rescaling factors and transform a dataset. The red band is band 4 and near infrared is band 5.
[11]:
def load_scale_factors(filename, band_number):
with open(filename) as f:
metadata = json.load(f)
M_p = metadata['L1_METADATA_FILE'] \
['RADIOMETRIC_RESCALING'] \
['REFLECTANCE_MULT_BAND_{}'.format(band_number)]
A_p = metadata['L1_METADATA_FILE'] \
['RADIOMETRIC_RESCALING'] \
['REFLECTANCE_ADD_BAND_{}'.format(band_number)]
return M_p, A_p
[12]:
def calculate_reflectance(ds, band_number, metafile='meta.json'):
M_p, A_p = load_scale_factors(metafile, band_number)
toa = M_p * ds + A_p
return toa
[13]:
red_toa = calculate_reflectance(red, band_number=4)
nir_toa = calculate_reflectance(nir, band_number=5)
Because the transformation is composed of arithmetic operations, execution is delayed and the operations are parallelized automatically.
[14]:
print(red_toa.variable.data)
dask.array<add, shape=(1, 7841, 7721), dtype=float64, chunksize=(1, 1024, 1024), chunktype=numpy.ndarray>
The resulting image has floating point data with magnitudes appropriate to reflectance. This can be checked by computing the range of values in an image:
[15]:
red_max, red_min, red_mean = dask.compute(
red_toa.max(dim=['x', 'y']),
red_toa.min(dim=['x', 'y']),
red_toa.mean(dim=['x', 'y'])
)
print(red_max.item())
print(red_min.item())
print(red_mean.item())
1.2107
-0.1
0.05921340355007137
Calculate and display NDVI¶
Now that we have the image as reflectance values, we are ready to compute NDVI.
This highlights areas of healthy vegetation with high NDVI values, which appear as green in the image below.
[16]:
ndvi = (nir_toa - red_toa) / (nir_toa + red_toa)
[17]:
ndvi2d = ndvi.squeeze()
[18]:
plt.figure()
im = ndvi2d.compute().plot.imshow(cmap='BrBG', vmin=-0.5, vmax=1)
plt.axis('equal')
plt.show()
