geomermaids · Cloud-Native Geospatial
What it is, why it matters, and how to move a real GIS stack onto it without losing capability.
Looking for the commands? The cookbook has copy-paste recipes for each format:
- COG cookbook: generation, compression choice, validation, mosaicking, reading over HTTP.
- GeoParquet cookbook: versions, encoding, row group sizing, and recipes for GDAL, DuckDB, gpio, GeoPandas.
The problem
Geospatial data has traditionally shipped in monolithic files (Shapefile, File Geodatabase, GeoTIFF) or lived behind always-on databases and paid APIs. Both make moving data expensive: downloads, servers, licensing, lock-in. For a field whose data is global by nature, that is backwards.
The shift
The last decade has seen geospatial quietly adopt the same patterns that transformed the rest of the web: columnar formats, HTTP range requests, object storage, serverless compute. Put together, these turn geospatial data into something you can query directly from a URL, without downloading the whole file or running a dedicated server.
Cloud-Native Geospatial (CNG) is the loose name for this set of patterns, and the OGC now has a dedicated working group around it. The ideas are not new on their own. What is new is that they fit together into a coherent stack.
The four pillars
Formats Cloud-optimized file formats
The first requirement is a file format that a client can read in pieces over HTTP, without downloading the whole thing. This is really a key aspect, often overseen because a bit technical, but if you can't download small chunks of a large file there is no Cloud-Native Geospatial working. Four formats do most of the work and cover most of the needs:
- COG (Cloud Optimized GeoTIFF): a tiled GeoTIFF with internal overviews, laid out so a client can request just the tiles and zoom level it needs. The obvious cloud-native replacement for GeoTIFF, super easy to use, meaning you don't have to download a whole scene to compute locally an NDVI index. Your piece of software only requests the area of interet, downloads the 2 bands and compute the (NIR - Red) / (NIR + Red) locally. See what Deck.gl Raster is capable of
- GeoParquet: a columnar, compressed, splittable format for vector data. A query reads only the columns and row groups it touches. The modern answer to Shapefile and File Geodatabase for analytical workloads.
- Zarr: N-dimensional chunked arrays, at home with climate and ocean cubes. Excellent for raster/array data with a time dimension, less useful for tabular vector data.
- FlatGeobuf: streamable binary vector format with a spatial index embedded in the header. A practical Shapefile replacement when you want a single file rather than a partitioned collection.
Catalogs STAC
Good formats are not enough on their own. Without a way to describe where data lives and what is in it, you are just staring at a bucket full of files. STAC (SpatioTemporal Asset Catalog) is a JSON-based specification for that metadata layer: items, collections, assets, extents, time.
Catalogs are what turn individual cloud-native files into a searchable archive. Element 84's Earth Search, Microsoft's Planetary Computer, Source Cooperative, and USGS landsatlook all expose STAC APIs against the same underlying object storage, and a client can discover and query across them with identical code.
Query Engines and drivers
The query side has quietly been transformed. A few tools do most of the heavy lifting:
- DuckDB with the spatial extension: SQL against GeoParquet over HTTP, joins across URLs, predicate pushdown so only the bytes you need get fetched. A laptop can replace a small data warehouse for most read-heavy geospatial workloads.
- GDAL: now ships drivers for GeoParquet, COG, and STAC. This quietly opens up the whole legacy toolchain (QGIS, ogr2ogr, rasterio) to cloud-native formats.
- titiler and rio-tiler: render COG as XYZ tiles on demand, serverless-friendly, no pre-rendering required. No local cache to manage, use it behind a CDN if you need one.
- Martin and pg_tileserv: vector tile servers for PostGIS, for the cases where a live database is still the right tool.
The important change is the cost model. Range requests and predicate pushdown mean you pay for bytes touched, not bytes stored. That inverts a lot of old assumptions about how to lay data out.
Compute Object storage and serverless
Object storage (S3, R2, GCS, Azure Blob) is the default home for data at rest: durable, cheap, globally addressable, no schema to maintain. Serverless compute (Lambda, Cloud Run, Cloudflare Workers) handles the work that is not pure read: tile rendering, API responses, scheduled conversions. Edge caching turns that work into something close to free once it is warm.
The headline property is that the stack scales down to zero. No always-on servers, no idle cost, no minimum fleet size.
What it unlocks
- Reproducibility. Immutable snapshots mean an analysis you ran today will produce the same result next year against the same bytes.
- Cost reduction. Storage is near free, quite a commodity; compute is pay-per-query; no dedicated servers idle overnight.
- Portability. The data is yours in an open format, on any S3-compatible backend, independent of any single vendor.
- Lower barrier. A laptop with DuckDB can replace a small data warehouse for most read workloads. That changes who can do the work.
- Ecosystem effects. Open catalogs lead to open analysis, which leads to more tools, which lowers the barrier further.
Worked examples
Three concrete systems that use the stack end to end:
Query OSM without downloading it
Daily GeoParquet snapshots of OpenStreetMap for North America, queryable straight from a URL with DuckDB. No downloads, no account.
Render Sentinel-2 without pre-rendering
Async COG tile server that reads Sentinel-2 scenes directly from object storage. No pre-rendering, no database, minimal footprint.
NASA VEDA
NASA's open science platform: STAC catalog and COG archive on AWS, with open source frontend and ingest tooling. The full cloud-native stack, operated at agency scale.
Who is using this in production
Open STAC API over public Sentinel, Landsat, and NAIP archives. The reference pattern for multi-tenant CNG catalogs.
Open data storage and publishing platform, built around STAC and cloud-optimized formats. A neutral home for published datasets.
AWS, Meta, Microsoft, and TomTom publishing open map data in GeoParquet, with a shared schema. Proof that the format has made it into mainstream geospatial releases.
NASA is migrating the full EOSDIS Earth science archive to AWS through 2026, served as COG and Zarr with STAC metadata. The VEDA platform wraps it in an open source STAC API and visualization stack.
What it does not solve
CNG is a set of patterns, not a universal answer. It does not help with:
- High-frequency transactional writes. PostGIS and an actual database still win when the workload is inserts and updates, not reads.
- Sub-second real-time ingest at massive scale. Streaming architectures (Kafka, Flink) solve a different problem.
- Licensing and attribution. The format does not resolve the legal question of who is allowed to redistribute what.
- Organizations not ready to leave on-prem. CNG assumes you can use object storage. If you cannot, most of the benefits go with it. But there are workarounds to be explored.
Glossary
- COG
- Cloud Optimized GeoTIFF. A regular GeoTIFF with internal tiling and overviews, plus a byte layout that lets a client read just the pieces it needs over HTTP.
- GeoParquet
- A columnar, compressed file format for vector geospatial data. A Parquet file with a well-defined geometry column and metadata.
- Zarr
- A format for chunked, compressed N-dimensional arrays. Widely used for raster and climate data cubes with a time dimension.
- FlatGeobuf
- A streamable binary format for vector data, with a spatial index in the header so a client can do bbox queries without reading the whole file.
- STAC
- SpatioTemporal Asset Catalog. A JSON specification for describing geospatial data in object storage: items, collections, extents, time, assets.
- Range request
- An HTTP feature (
Range:header) that lets a client request a specific byte range of a file. The foundation that makes cloud-optimized formats work. - Predicate pushdown
- A query engine feature that pushes filters down to the storage layer so only the rows that match get read. Without it, columnar formats do not save much.
- Object storage
- Flat, key-addressable blob storage (S3, R2, GCS, Azure Blob). Durable, cheap, globally addressable, no schema.
- Serverless compute
- On-demand compute (Lambda, Cloud Run, Cloudflare Workers) that spins up per request and disappears when idle. Pay per invocation, not per hour.
Migration checklist
Moving a traditional GIS stack (Shapefile, File Geodatabase, File-based tile servers) onto the cloud-native pattern, in roughly the order I would tackle it:
- Pick your object store. S3, R2, GCS, or Azure Blob. For most read-heavy public datasets, Cloudflare R2 is the cheapest option because egress is free.
- Convert raster to COG.
gdal_translatewith-of COGdoes the job. Verify withrio cogeo validate. - Convert vector to GeoParquet. GeoParquet 1.0 support requires GDAL 3.8+; 1.1 (bbox + CRS) requires 3.9+. Partition by a spatial or thematic key if the dataset is large.
- Publish a STAC catalog. Either static (JSON files in the same bucket) or dynamic (stac-fastapi, pgstac). Static is simpler and cheaper; dynamic is needed for large or frequently-updated catalogs.
- Replace pre-rendered tile caches with on-demand rendering. titiler or a Cloudflare Worker in front of your COGs. Cache aggressively at the edge.
- Replace download endpoints with direct URLs. If your users ran
wgetfollowed by a local query, they can now run the query against the URL. - Document it ruthlessly. Cloud-native geospatial is still unfamiliar to most teams. One page of working examples with real URLs will do more for adoption than any amount of spec reading.
- Keep PostGIS where it fits. The goal is not to migrate everything, it is to move the read-heavy, reproducibility-sensitive parts of the stack. Transactional writes, geocoding, and live editing still belong in a transactional database.
Where it is going
The OGC Cloud-Native Geospatial working group formalized the conversation. GDAL now has first-class GeoParquet support. DuckDB Spatial ships in the core distribution. QGIS reads COG and STAC natively. Overture Maps has made GeoParquet the default distribution format for open world-scale map data. The direction of travel is clear: cloud-native is becoming the default, not the exception.
Where there is still work to do: tooling for users who are not engineers, better visualization clients for direct-URL data, and honest documentation that assumes readers are not already experts.
Further reading
- Cookbook on this site: working commands for COG and GeoParquet generation, publishing, and reading.
- cloudnativegeo.org: the community hub.
- OGC Cloud-Native Geospatial SWG: the formal working group.
- cogeo.org: COG specification and examples.
- geoparquet.org: GeoParquet specification.
- stacspec.org: STAC specification and ecosystem.
- Projects on this site: concrete CNG implementations I have built and open sourced.
Working on a cloud-native geospatial migration and want a second pair of eyes, or just want to talk through whether CNG is the right fit for your problem? Get in touch.