geomermaids · COG cookbook
Recipes for Cloud Optimized GeoTIFF: generation, compression choice, validation, mosaicking, and reading over HTTP with range requests. Shared publishing lives on the cookbook index.
Tools and setup
COG work only needs two things: GDAL for generation and reading, and rio-cogeo for validation.
GDAL
The COG driver lands in GDAL 3.1. Anything 3.5 or newer is fine; modern
builds (3.9+) also handle ZSTD compression out of the box.
brew install gdal # macOS
sudo apt install gdal-bin # Ubuntu / Debian
gdalinfo --version # expect 3.5 or newer rio-cogeo
A small Python tool for validating a COG against the spec and (optionally) converting
GeoTIFFs when you want a Python-native path. gdal_translate -of COG is
usually enough for generation; keep rio-cogeo for the validate
step.
pip install rio-cogeo
rio cogeo --help From a regular GeoTIFF
The COG driver (GDAL 3.1+) handles tiling, overviews, and layout in one step. Always prefer it over the legacy GTiff + -co TILED=YES dance.
gdal_translate in.tif out.tif -of COG \
-co COMPRESS=ZSTD \
-co LEVEL=15 \
-co PREDICTOR=YES \
-co BLOCKSIZE=512 \
-co OVERVIEW_RESAMPLING=AVERAGE \
-co NUM_THREADS=ALL_CPUS -of COG: native COG driver. Generates a valid COG, builds overviews, and runs the layout check in a single pass.COMPRESS=ZSTD: better ratio and speed than DEFLATE for continuous rasters. Needs libzstd in the GDAL build (any modern build has it).LEVEL=15: ZSTD compression level. 15 is the sweet spot between ratio and encode time; drop to 9 for faster writes, push to 22 for archival.PREDICTOR=YES: lets GDAL pick the right predictor (2 for integers, 3 for floats). Compression-critical for DEMs and imagery.BLOCKSIZE=512: the internal tile size. 512 is the safe default. Drop to 256 for small rasters, bump to 1024 for very large single-band archives.OVERVIEW_RESAMPLING=AVERAGE: correct for continuous data. UseNEARESTfor categorical rasters (land cover, classification),CUBICfor visual imagery.NUM_THREADS=ALL_CPUS: parallelize encoding. Essential for anything over a few hundred MB.
Choosing compression
The right codec depends on the data, not on personal preference. A quick table:
| Codec | Best for | Notes |
|---|---|---|
ZSTD | Continuous rasters (DEM, Sentinel bands, elevation) | Best default since GDAL 3.3. Lossless. |
DEFLATE | Legacy clients that can't read ZSTD | Universally supported. Slower and worse ratio than ZSTD. |
LZW | Categorical rasters (land cover, masks) | Fast decompression. Poor on floating-point. |
JPEG | 8-bit RGB visual imagery | Lossy. Never for analysis. Fine for basemap tiles. |
NONE | Already-compressed sources | Skip if input is incompressible (e.g. encrypted or pre-JPEG). |
Validating
A file can have .tif extension, internal tiling, and still not be a valid COG. Always validate before uploading:
pip install rio-cogeo
rio cogeo validate out.tif A passing file returns out.tif is a valid cloud optimized GeoTIFF. A failing one prints the specific issues (missing overviews, wrong IFD order, oversized header). Treat this as a gate before aws s3 cp.
Mosaic then COG
If the input is a directory of tiles, build a virtual raster first, then materialize the COG in one pass. No intermediate file on disk.
gdalbuildvrt -resolution highest mosaic.vrt input/*.tif
gdal_translate mosaic.vrt out.tif -of COG \
-co COMPRESS=ZSTD \
-co PREDICTOR=YES \
-co NUM_THREADS=ALL_CPUS
The .vrt is a virtual mosaic: it references the input tiles but copies no
bytes. gdal_translate then reads the VRT and writes a single COG. This is
the fastest way to merge a directory of Sentinel or Landsat tiles into one
analysis-ready file. Because access will be made through partial HTTP-Range requests,
the actual size of the COG file doesn't really matter.
Reading a COG back
Verification is the other half of the job. If you cannot read the file from two different clients over HTTP, it is not published yet. For the vector-side equivalent (DuckDB session tuning, predicate pushdown, DuckDB-WASM in the browser), see the GeoParquet Reading Cookbook.
GDAL
gdalinfo /vsicurl/https://my-bucket.s3.amazonaws.com/cog/out.tif
The /vsicurl/ prefix tells GDAL to stream the file over HTTP with range
requests. Works transparently with every tool built on GDAL: rasterio,
ogr2ogr, QGIS, PostGIS raster. Pipe it into your analysis with no local
copy.
QGIS
QGIS 3.22+ reads COGs natively over HTTP and handles range requests behind the scenes. Zoom and pan stream only the tiles the current view needs, from the appropriate overview level. A multi-GB COG on S3 feels as responsive as a local GeoTIFF once the first pan warms the cache.
- Data Source Manager → Raster → Protocol: HTTP(S) / cloud, generic
- URL:
https://my-bucket.s3.amazonaws.com/cog/out.tif - Add. Pan and zoom stream only the bytes needed.
QGIS's story for the vector side of cloud-native geospatial is much less happy: see the QGIS specifics in the GeoParquet cookbook for the asymmetry.
COG-specific pitfalls
- "Tiled GeoTIFF" is not "COG". Always generate with
-of COGor validate withrio cogeo validate. A tiled GeoTIFF without the correct IFD layout or overviews looks fine in QGIS but breaks range-request clients. - Missing overviews. A COG without overviews is technically valid
but hammered by tile servers on every zoom-out. Always build them (the
COGdriver does this by default). - Wrong compression for the workload. JPEG looks great in a viewer and silently corrupts scientific analysis. Use ZSTD for anything you will later run math on.
Publishing pitfalls (Content-Type, CORS, mutable filenames) are on the cookbook index, since they apply to COG and GeoParquet alike.
Next
Done with rasters? Head to the GeoParquet cookbook. The index has shared setup, publishing, and a cross-format common-pitfalls list. The conceptual background to all of this is on the Cloud-Native Geospatial page.