geomermaids · Cloud-Native Geospatial

What it is, why it matters, and how to move a real GIS stack onto it without losing capability.

Looking for the commands? The cookbook has copy-paste recipes for each format:

The problem

Geospatial data has traditionally shipped in monolithic files (Shapefile, File Geodatabase, GeoTIFF) or lived behind always-on databases and paid APIs. Both make moving data expensive: downloads, servers, licensing, lock-in. For a field whose data is global by nature, that is backwards.

The shift

The last decade has seen geospatial quietly adopt the same patterns that transformed the rest of the web: columnar formats, HTTP range requests, object storage, serverless compute. Put together, these turn geospatial data into something you can query directly from a URL, without downloading the whole file or running a dedicated server.

Cloud-Native Geospatial (CNG) is the loose name for this set of patterns, and the OGC now has a dedicated working group around it. The ideas are not new on their own. What is new is that they fit together into a coherent stack.

The four pillars

Formats Cloud-optimized file formats

The first requirement is a file format that a client can read in pieces over HTTP, without downloading the whole thing. This is really a key aspect, often overseen because a bit technical, but if you can't download small chunks of a large file there is no Cloud-Native Geospatial working. Four formats do most of the work and cover most of the needs:

Catalogs STAC

Good formats are not enough on their own. Without a way to describe where data lives and what is in it, you are just staring at a bucket full of files. STAC (SpatioTemporal Asset Catalog) is a JSON-based specification for that metadata layer: items, collections, assets, extents, time.

Catalogs are what turn individual cloud-native files into a searchable archive. Element 84's Earth Search, Microsoft's Planetary Computer, Source Cooperative, and USGS landsatlook all expose STAC APIs against the same underlying object storage, and a client can discover and query across them with identical code.

Query Engines and drivers

The query side has quietly been transformed. A few tools do most of the heavy lifting:

The important change is the cost model. Range requests and predicate pushdown mean you pay for bytes touched, not bytes stored. That inverts a lot of old assumptions about how to lay data out.

Compute Object storage and serverless

Object storage (S3, R2, GCS, Azure Blob) is the default home for data at rest: durable, cheap, globally addressable, no schema to maintain. Serverless compute (Lambda, Cloud Run, Cloudflare Workers) handles the work that is not pure read: tile rendering, API responses, scheduled conversions. Edge caching turns that work into something close to free once it is warm.

The headline property is that the stack scales down to zero. No always-on servers, no idle cost, no minimum fleet size.

What it unlocks

Worked examples

Three concrete systems that use the stack end to end:

Query OSM without downloading it

Daily GeoParquet snapshots of OpenStreetMap for North America, queryable straight from a URL with DuckDB. No downloads, no account.

Render Sentinel-2 without pre-rendering

Async COG tile server that reads Sentinel-2 scenes directly from object storage. No pre-rendering, no database, minimal footprint.

NASA VEDA

NASA's open science platform: STAC catalog and COG archive on AWS, with open source frontend and ingest tooling. The full cloud-native stack, operated at agency scale.

Who is using this in production

Element 84 · Earth Search

Open STAC API over public Sentinel, Landsat, and NAIP archives. The reference pattern for multi-tenant CNG catalogs.

Radiant Earth · Source Cooperative

Open data storage and publishing platform, built around STAC and cloud-optimized formats. A neutral home for published datasets.

Overture Maps Foundation

AWS, Meta, Microsoft, and TomTom publishing open map data in GeoParquet, with a shared schema. Proof that the format has made it into mainstream geospatial releases.

NASA · Earthdata Cloud & VEDA

NASA is migrating the full EOSDIS Earth science archive to AWS through 2026, served as COG and Zarr with STAC metadata. The VEDA platform wraps it in an open source STAC API and visualization stack.

What it does not solve

CNG is a set of patterns, not a universal answer. It does not help with:

Glossary

COG
Cloud Optimized GeoTIFF. A regular GeoTIFF with internal tiling and overviews, plus a byte layout that lets a client read just the pieces it needs over HTTP.
GeoParquet
A columnar, compressed file format for vector geospatial data. A Parquet file with a well-defined geometry column and metadata.
Zarr
A format for chunked, compressed N-dimensional arrays. Widely used for raster and climate data cubes with a time dimension.
FlatGeobuf
A streamable binary format for vector data, with a spatial index in the header so a client can do bbox queries without reading the whole file.
STAC
SpatioTemporal Asset Catalog. A JSON specification for describing geospatial data in object storage: items, collections, extents, time, assets.
Range request
An HTTP feature (Range: header) that lets a client request a specific byte range of a file. The foundation that makes cloud-optimized formats work.
Predicate pushdown
A query engine feature that pushes filters down to the storage layer so only the rows that match get read. Without it, columnar formats do not save much.
Object storage
Flat, key-addressable blob storage (S3, R2, GCS, Azure Blob). Durable, cheap, globally addressable, no schema.
Serverless compute
On-demand compute (Lambda, Cloud Run, Cloudflare Workers) that spins up per request and disappears when idle. Pay per invocation, not per hour.

Migration checklist

Moving a traditional GIS stack (Shapefile, File Geodatabase, File-based tile servers) onto the cloud-native pattern, in roughly the order I would tackle it:

  1. Pick your object store. S3, R2, GCS, or Azure Blob. For most read-heavy public datasets, Cloudflare R2 is the cheapest option because egress is free.
  2. Convert raster to COG. gdal_translate with -of COG does the job. Verify with rio cogeo validate.
  3. Convert vector to GeoParquet. GeoParquet 1.0 support requires GDAL 3.8+; 1.1 (bbox + CRS) requires 3.9+. Partition by a spatial or thematic key if the dataset is large.
  4. Publish a STAC catalog. Either static (JSON files in the same bucket) or dynamic (stac-fastapi, pgstac). Static is simpler and cheaper; dynamic is needed for large or frequently-updated catalogs.
  5. Replace pre-rendered tile caches with on-demand rendering. titiler or a Cloudflare Worker in front of your COGs. Cache aggressively at the edge.
  6. Replace download endpoints with direct URLs. If your users ran wget followed by a local query, they can now run the query against the URL.
  7. Document it ruthlessly. Cloud-native geospatial is still unfamiliar to most teams. One page of working examples with real URLs will do more for adoption than any amount of spec reading.
  8. Keep PostGIS where it fits. The goal is not to migrate everything, it is to move the read-heavy, reproducibility-sensitive parts of the stack. Transactional writes, geocoding, and live editing still belong in a transactional database.

Where it is going

The OGC Cloud-Native Geospatial working group formalized the conversation. GDAL now has first-class GeoParquet support. DuckDB Spatial ships in the core distribution. QGIS reads COG and STAC natively. Overture Maps has made GeoParquet the default distribution format for open world-scale map data. The direction of travel is clear: cloud-native is becoming the default, not the exception.

Where there is still work to do: tooling for users who are not engineers, better visualization clients for direct-URL data, and honest documentation that assumes readers are not already experts.

Further reading

Working on a cloud-native geospatial migration and want a second pair of eyes, or just want to talk through whether CNG is the right fit for your problem? Get in touch.