r03·intermediate

Multi-scale spatial fusion — joining 30 m, 70 m, 250 m, and 1 km observations

LP DAAC (HLSMODISECOSTRESS)all-DAAC general pattern

NASA's Earth-observation portfolio spans **3 orders of magnitude in spatial resolution**: ECOSTRESS at 70 m, MODIS at 1 km, HLS at 30 m, GEDI footprints at 25 m, NISAR at 7 m, GRACE-FO at 300 km. Different missions serve different questions because their resolution trades against revisit, swath, and signal-to-noise. Combining them isn't optional — it's how most useful science gets done. But: **you can't just stack a 30 m raster and a 1 km raster and pixel-multiply.** Different things happen at different scales. This recipe is the safe pattern.

Multi-scale spatial fusion

NASA’s Earth-observation portfolio spans 3 orders of magnitude in spatial resolution: ECOSTRESS at 70 m, MODIS at 1 km, HLS at 30 m, GEDI footprints at 25 m, NISAR at 7 m, GRACE-FO at 300 km. Different missions serve different questions because their resolution trades against revisit, swath, and signal-to-noise. Combining them isn’t optional — it’s how most useful science gets done.

But: you can’t just stack a 30 m raster and a 1 km raster and pixel-multiply. Different things happen at different scales. This recipe is the safe pattern.

The pattern — choose ONE of these three approaches

Approach A — Aggregate fine to coarse (most common, safest)

Resample the finer-resolution layer up to the coarser resolution by area-weighted averaging (or another aggregation that respects the variable’s physical meaning). The coarse grid becomes the analysis grid.

When to use: your science question is fundamentally at the coarser scale (e.g., basin-level water budget, regional carbon flux). Information loss is acceptable; alignment is clean.

Approach B — Downscale coarse to fine (use a model, not interpolation)

Use a relationship between the coarse variable and ancillary fine-resolution covariates (DEM, NDVI, land cover) to predict the fine-resolution version. Random Forest, regression Kriging, or a foundation-model embedding can all work as the predictor.

When to use: your science question is at the fine scale and you can’t accept averaging away the fine signal. SMAP 36 km → 9 km enhanced uses this idea (Backus-Gilbert inversion + ancillary). MODIS NDVI 250 m → Landsat 30 m via NDVI-conserving disaggregation is similar.

Warning: every downscaled pixel is now model-dependent. Don’t claim point-scale measurements from downscaled products.

Approach C — Vector / point representation (don’t resample at all)

Treat the higher-resolution data as a sparse vector layer (GEDI footprints, ICESat-2 tracks, OCO-2 sounding points). Sample the coarser raster at the vector points. Analysis happens in tabular form, not on a grid.

When to use: the fine-resolution data is naturally sparse (lidar footprints) or non-grid (orbital tracks). Common with GEDI + HLS + SMAP joins.

Minimal worked example — joining HLS 30 m + MODIS 1 km + ECOSTRESS 70 m at a 1 km common grid

import earthaccess
import xarray as xr
import rioxarray
from rasterio.enums import Resampling

earthaccess.login(strategy="netrc")

aoi = (-122, 38, -120, 40)
window = ("2024-07-01", "2024-09-30")

# 1. HLS NDVI (30m source)
hls = earthaccess.search_data(short_name="HLSL30", bounding_box=aoi,
                               temporal=window, cloud_cover=20)
# Open + compute NDVI per scene → mean over period → 30m raster

# 2. MODIS LST (1 km — pick this as the common grid)
mod_lst = earthaccess.search_data(short_name="MOD11A1", bounding_box=aoi,
                                  temporal=window)
# Open + extract LST_Day_1km; mean over period → 1km raster

# 3. ECOSTRESS LST (70m)
eco = earthaccess.search_data(short_name="ECO2LSTE", bounding_box=aoi,
                              temporal=window)
# Open + extract LST_70m; mean over period → 70m raster

# 4. Aggregate HLS NDVI 30m → 1km (Approach A — area-weighted average)
hls_1km = hls_ndvi.rio.reproject_match(
    mod_lst, resampling=Resampling.average
)

# 5. Aggregate ECOSTRESS 70m → 1km
eco_1km = eco_lst.rio.reproject_match(
    mod_lst, resampling=Resampling.average
)

# 6. Stack on common 1km grid
ds = xr.Dataset({
    "ndvi": hls_1km,
    "lst_modis_day": mod_lst.mean("time"),
    "lst_ecostress": eco_1km.mean("time"),
})

# 7. Cross-sensor analysis (now safe — same grid, same projection)
# e.g., correlate ECOSTRESS daytime LST with HLS NDVI per pixel

Common gotchas

  • Reprojection without matching CRSs first drops to nearest-neighbor interpolation in some libraries — silently. Always check da.rio.crs and explicitly reproject before resampling.
  • Resampling.average on a categorical (land-cover) raster is meaningless. Use Resampling.mode or treat per-class.
  • Temporal alignment matters too. HLS is 16-day revisit; MODIS daily; ECOSTRESS variable. Aggregate to a common window (typically monthly) before spatial fusion.
  • Edge effects at AOI boundaries are real — load 1 extra coarse pixel beyond your AOI for clean reprojection.
  • rioxarray vs xarray-spatial vs rasterio-direct all have slightly different reprojection defaults. Stick with one.

When this pattern fails

  • When the variable doesn’t aggregate linearly. Water vapor column averaged over 1 km is fine; cloud fraction averaged over 1 km is not (it’s a non-linear transform from cloudy pixels to “cloudiness”). Think before averaging.
  • When sensor footprints overlap pixel boundaries (GEDI footprints over HLS pixels). Treat as sparse vector (Approach C).
  • When the physical scale of the process matches one of the resolutions but not others. Don’t average ECOSTRESS 70 m field-scale anomalies to 1 km — the signal disappears.

Sources

The steps, code, and sources below are kept in the original English for technical accuracy.