geomermaids · Cookbook
Concrete recipes for the two main cloud-native geospatial formats. Assumes familiarity with GDAL and DuckDB; the point here is to show the right commands and explain the reasoning behind each flag.
Cloud Optimized GeoTIFF
Generate, validate, mosaic, and read COG files. Compression codec choice, the COG driver, HTTP range requests, QGIS raster workflow.
GeoParquet
Versions 1.0 / 1.1 / 2.0, encoding strategies, row group sizing. Recipes for GDAL, DuckDB, gpio, and GeoPandas. Includes the QGIS-downloads-the-whole-file caveat.
Tool requirements differ between the two formats (COG needs GDAL and rio-cogeo;
GeoParquet brings in DuckDB, gpio, and optionally GeoPandas), so each
sub-page has its own Tools and setup section at the top. The sections below cover what
they share: publishing to object storage and the pitfalls that apply to both formats.
Publishing to object storage
Same commands apply to both COG and GeoParquet, only the Content-Type changes.
S3
aws s3 cp out.tif s3://my-bucket/cog/out.tif \
--content-type image/tiff \
--cache-control "public, max-age=31536000, immutable"
aws s3 cp out.parquet s3://my-bucket/vector/out.parquet \
--content-type application/vnd.apache.parquet \
--cache-control "public, max-age=31536000, immutable" Content-Type tells browsers and tools what the file is without sniffing.
Cache-Control: immutable tells CDNs to cache indefinitely, which is safe
because cloud-native files are versioned by filename (append a date or hash) rather
than mutated in place.
Cloudflare R2
R2 is S3-compatible and egress-free, which is the right substrate for open data. The easiest client is rclone.
# ~/.config/rclone/rclone.conf (one-time)
[r2]
type = s3
provider = Cloudflare
endpoint = https://<account-id>.r2.cloudflarestorage.com
access_key_id = ...
secret_access_key = ...
rclone copy out.tif r2:my-bucket/cog/ \
--header-upload "Content-Type: image/tiff" \
--header-upload "Cache-Control: public, max-age=31536000, immutable" CORS reminder: if browsers will fetch the file directly (tile rendering
with Deck.gl, DuckDB-WASM, STAC browsers), the bucket needs a permissive CORS policy.
R2's dashboard has a one-click permissive default; S3 needs a JSON policy. Test with
curl -I -H "Origin: https://example.com" after upload.
Common pitfalls (publishing)
- Missing
Content-Typeor CORS on upload. Browsers and WASM clients refuse silently.curl -Iafter upload to confirm headers before announcing the URL to users. - Mutable filenames. If you overwrite
latest.parquetevery night, CDN cache and analysis reproducibility both break. Publish dated snapshots (2026-04-22.parquet), then point a small redirect or symlink at the latest.
Format-specific pitfalls live on each sub-page: COG (non-COG layouts, missing overviews, wrong compression) and GeoParquet (tiny row groups, plain pandas, unsorted features).
Next steps
The conceptual background to all of this lives on the Cloud-Native Geospatial page. Working examples of end-to-end systems built with these recipes are under Projects. If you want a second pair of eyes on a specific migration, get in touch.