g11·concept

Kerchunk / VirtualiZarr

Kerchunk / VirtualiZarr

A technique that creates a small “index” file letting you read legacy NetCDF/HDF files as if they were cloud-native Zarr — without converting the originals.

Why it matters

~85% of NASA collections lack cloud-native reference files. Kerchunk/VirtualiZarr is the bridge that makes the legacy archive queryable at Zarr speed without re-archiving petabytes.

Where you’ll meet it

  • Pointing an xarray + fsspec workflow at a NASA NetCDF/HDF5 collection in S3 and reading it lazily through a generated reference (JSON) file — no full download.
  • Building one combined virtual dataset across thousands of granules (e.g. a long MODIS or MERRA-2 time series) so you can slice across time without opening each file.
  • VirtualiZarr is the newer library that produces these references and can serialize them as Zarr metadata, increasingly the recommended successor to the original Kerchunk approach.

In plain terms

Like adding a table-of-contents sticker to old books so you can flip straight to any page — the book is unchanged, but now it’s instantly navigable.