geomermaids · Cookbook

Concrete recipes for the two main cloud-native geospatial formats. Assumes familiarity with GDAL and DuckDB; the point here is to show the right commands and explain the reasoning behind each flag.

Cloud Optimized GeoTIFF

Generate, validate, mosaic, and read COG files. Compression codec choice, the COG driver, HTTP range requests, QGIS raster workflow.

GeoParquet

Versions 1.0 / 1.1 / 2.0, encoding strategies, row group sizing. Recipes for GDAL, DuckDB, gpio, and GeoPandas. Includes the QGIS-downloads-the-whole-file caveat.

Tool requirements differ between the two formats (COG needs GDAL and rio-cogeo; GeoParquet brings in DuckDB, gpio, and optionally GeoPandas), so each sub-page has its own Tools and setup section at the top. The sections below cover what they share: publishing to object storage and the pitfalls that apply to both formats.

Publishing to object storage

Same commands apply to both COG and GeoParquet, only the Content-Type changes.

S3

aws s3 cp out.tif s3://my-bucket/cog/out.tif \
  --content-type image/tiff \
  --cache-control "public, max-age=31536000, immutable"

aws s3 cp out.parquet s3://my-bucket/vector/out.parquet \
  --content-type application/vnd.apache.parquet \
  --cache-control "public, max-age=31536000, immutable"

Content-Type tells browsers and tools what the file is without sniffing. Cache-Control: immutable tells CDNs to cache indefinitely, which is safe because cloud-native files are versioned by filename (append a date or hash) rather than mutated in place.

Cloudflare R2

R2 is S3-compatible and egress-free, which is the right substrate for open data. The easiest client is rclone.

# ~/.config/rclone/rclone.conf  (one-time)
[r2]
type = s3
provider = Cloudflare
endpoint = https://<account-id>.r2.cloudflarestorage.com
access_key_id = ...
secret_access_key = ...

rclone copy out.tif r2:my-bucket/cog/ \
  --header-upload "Content-Type: image/tiff" \
  --header-upload "Cache-Control: public, max-age=31536000, immutable"

CORS reminder: if browsers will fetch the file directly (tile rendering with Deck.gl, DuckDB-WASM, STAC browsers), the bucket needs a permissive CORS policy. R2's dashboard has a one-click permissive default; S3 needs a JSON policy. Test with curl -I -H "Origin: https://example.com" after upload.

Common pitfalls (publishing)

Format-specific pitfalls live on each sub-page: COG (non-COG layouts, missing overviews, wrong compression) and GeoParquet (tiny row groups, plain pandas, unsorted features).

Next steps

The conceptual background to all of this lives on the Cloud-Native Geospatial page. Working examples of end-to-end systems built with these recipes are under Projects. If you want a second pair of eyes on a specific migration, get in touch.