g10·concept
Zarr
Zarr
A format for storing large N-dimensional arrays in chunks, designed for cloud + parallel access. Common for climate model output and multi-dimensional datacubes.
Why it matters
Where COG handles 2D imagery, Zarr handles many-dimensional stacks (time × lat × lon × variable). Cloud-native analysis of decades of data leans on Zarr.
Where you’ll meet it
- Climate and reanalysis datasets — long time-series of GPM precipitation or model output — are increasingly published as Zarr so you can slice one variable over one region across decades without reading whole files.
xarrayopening a Zarr store (often vias3fsstraight from cloud storage) lazily loads only the chunks your analysis touches, which is how big datacubes fit in a notebook.- Tools like Dask read many Zarr chunks in parallel, letting a cluster crunch a multi-year stack at once instead of one giant file serially.
- It’s the multi-dimensional counterpart to COG: COG for a single image, Zarr for the time × lat × lon × variable cube behind it.
In plain terms
Like a spreadsheet split into many small tiles so a hundred workers can each read a different tile at once instead of fighting over one giant file.