Joining HLS (LP) + GEDI (ORNL) + ICESat-2 (NSIDC) + SMAP (NSIDC) for a single science question means juggling 4 access libraries, 3 auth flows, and inconsistent formats. The pattern below dissolves that friction.

Cross-DAAC composition

“Joining HLS (LP) + GEDI (ORNL) + ICESat-2 (NSIDC) + SMAP (NSIDC) for a single science question requires juggling 4 access libraries, 3 auth flows, and inconsistent formats.” — Research Agent D, identifying this as Pattern B (one of three recurring friction patterns across the NASA Earth-data ecosystem)

The pattern below dissolves that friction.

The pattern in 5 steps

Authenticate once. Earthdata Login covers all 12 DAACs. Use earthaccess.login(strategy="netrc") and never log in again per-DAAC. (If you see a tutorial that has you log in 4 times, it’s out of date.)
Search once, federated. earthaccess.search_data(short_name=..., bounding_box=..., temporal=...) queries CMR which federates across DAACs. You don’t pick a DAAC; you pick a dataset.
Open via the format-appropriate library, not the DAAC-appropriate one. HLS = COG → rioxarray.open_rasterio. GEDI = HDF5 → h5py or earthaccess.open + custom reader. ICESat-2 = HDF5 → h5py + icepyx helpers. SMAP = HDF5 → xarray with h5netcdf engine. The format dictates the reader, not the DAAC.
Align in a single xarray Dataset or DataFrame. Resample temporally (HLS is observation-time, GEDI is footprint, SMAP is daily, ICESat-2 is orbit-time). Resample spatially (HLS 30m, GEDI footprint, SMAP 9km, ICESat-2 along-track). Pick the coarsest common spatial grid for analysis (usually SMAP at 9km) and aggregate the finer products up; OR pick a sparse-vector representation (one row per GEDI footprint, with HLS / SMAP / ICESat-2 values sampled at the footprint).
Cache aggressively. Cloud-direct from s3:// is fast in us-west-2 (where most NASA-EO cloud data lives). Out of region, download once locally and re-read. Don’t pay egress on every script run.

Minimal worked example

import earthaccess
import xarray as xr
import h5py
import pandas as pd

earthaccess.login(strategy="netrc")

aoi = (-105, 38, -102, 41)  # Front Range CO
window = ("2022-06-01", "2022-08-31")

# --- HLS L30 (LP DAAC) ---
hls = earthaccess.search_data(short_name="HLSL30", bounding_box=aoi, temporal=window, cloud_cover=20)
hls_da = xr.open_mfdataset([earthaccess.open([g])[0] for g in hls[:5]], engine="rasterio")
# compute NDVI from B4 (Red) and B5 (NIR), per-tile per-date

# --- GEDI L4A (ORNL DAAC) ---
gedi = earthaccess.search_data(short_name="GEDI_L4A_AGB_Density_V2_1_2056", bounding_box=aoi, temporal=window)
gedi_records = []
for fh in earthaccess.open(gedi[:5]):
    with h5py.File(fh, "r") as f:
        for beam in [k for k in f.keys() if k.startswith("BEAM")]:
            lats = f[f"{beam}/lat_lowestmode"][:]
            lons = f[f"{beam}/lon_lowestmode"][:]
            agbd = f[f"{beam}/agbd"][:]
            gedi_records.append(pd.DataFrame({"lat": lats, "lon": lons, "agbd": agbd, "beam": beam}))
gedi_df = pd.concat(gedi_records)
gedi_df = gedi_df[(gedi_df.lat.between(aoi[1], aoi[3])) & (gedi_df.lon.between(aoi[0], aoi[2]))]

# --- SMAP L3 (NSIDC DAAC) ---
smap = earthaccess.search_data(short_name="SPL3SMP", bounding_box=aoi, temporal=window)
smap_ds = xr.open_mfdataset([earthaccess.open([g])[0] for g in smap[:10]], engine="h5netcdf")
# extract `soil_moisture` per 9km pixel, daily

# --- Sample SMAP at GEDI footprint locations + dates ---
# (left as exercise; pseudo-code below)
# for each gedi row, find nearest SMAP pixel + nearest SMAP date → join

# Now you have one DataFrame: lat · lon · date · agbd (GEDI) · ndvi (HLS) · sm (SMAP)
# Cross-dataset analysis ready.

Common gotchas

earthaccess.open returns file-like handles, not paths. Some readers (rioxarray, h5py) accept them directly; others need a download first.
us-west-2 S3 credentials expire in 1 hour and are only valid in us-west-2. If your script takes >1 hr, refresh with earthaccess.get_s3_credentials(...). If you’re outside us-west-2, you’re using HTTPS, not S3 — fine for small jobs but slow + egress-billed for large.
GEDI footprint coords are not on a grid. They’re discrete along-orbit samples. Don’t try to xarray-stack them; treat as a sparse vector layer.
CMR pagination caps at 2000 per page but cmr-stac caps at 100. If you use cmr-stac for search, expect ~28× slower than python-cmr / earthaccess for the same query (per issue #411).
Temporal alignment is brutal. HLS revisit is 2–3 days. GEDI footprints don’t revisit at all. SMAP is daily. ICESat-2 revisit is 91 days. Pick the question’s time resolution and aggregate accordingly.

When this pattern fails (and what to do)

If you need sub-daily. GEO satellites (GOES, TEMPO, Himawari) live in the same CMR index but are hosted differently. The pattern adapts but the time resolution shifts.
If you need pre-2000. Some legacy MODIS, AIRS, MERRA-2 archives are still in non-cloud-optimized formats. The earthaccess.open path may not work; you’ll need earthaccess.download first.
If your AOI is huge. Cloud-direct stops being faster than mass-download around ~10° square per query. Switch to Harmony async + S3 download.

Power-user variant (Claude Code + MCP)

For repeated cross-DAAC composition with an agent loop, see recipes/r0X-mcp-power-user.mdx (TBD) — the same dataset access exposed as MCP tools so claude or Cursor can run these joins for you. (This is the subordinated MCP-server work from paths/run-2026-05-25-001.)

The steps, code, and sources below are kept in the original English for technical accuracy.