g10·concept

Zarr

Zarr

A format for storing large N-dimensional arrays in chunks, designed for cloud + parallel access. Common for climate model output and multi-dimensional datacubes.

Why it matters

Where COG handles 2D imagery, Zarr handles many-dimensional stacks (time × lat × lon × variable). Cloud-native analysis of decades of data leans on Zarr.

Where you’ll meet it

  • Climate and reanalysis datasets — long time-series of GPM precipitation or model output — are increasingly published as Zarr so you can slice one variable over one region across decades without reading whole files.
  • xarray opening a Zarr store (often via s3fs straight from cloud storage) lazily loads only the chunks your analysis touches, which is how big datacubes fit in a notebook.
  • Tools like Dask read many Zarr chunks in parallel, letting a cluster crunch a multi-year stack at once instead of one giant file serially.
  • It’s the multi-dimensional counterpart to COG: COG for a single image, Zarr for the time × lat × lon × variable cube behind it.

In plain terms

Like a spreadsheet split into many small tiles so a hundred workers can each read a different tile at once instead of fighting over one giant file.