g11·concept
Kerchunk / VirtualiZarr
Kerchunk / VirtualiZarr
A technique that creates a small “index” file letting you read legacy NetCDF/HDF files as if they were cloud-native Zarr — without converting the originals.
Why it matters
~85% of NASA collections lack cloud-native reference files. Kerchunk/VirtualiZarr is the bridge that makes the legacy archive queryable at Zarr speed without re-archiving petabytes.
Where you’ll meet it
- Pointing an
xarray+fsspecworkflow at a NASA NetCDF/HDF5 collection in S3 and reading it lazily through a generated reference (JSON) file — no full download. - Building one combined virtual dataset across thousands of granules (e.g. a long MODIS or MERRA-2 time series) so you can slice across time without opening each file.
- VirtualiZarr is the newer library that produces these references and can serialize them as Zarr metadata, increasingly the recommended successor to the original Kerchunk approach.
In plain terms
Like adding a table-of-contents sticker to old books so you can flip straight to any page — the book is unchanged, but now it’s instantly navigable.