geomermaids · GeoParquet Writing cookbook

GeoParquet is a great format, but getting it right can be tricky. The ecosystem is still young, and navigating versions, tools, and encoding options can feel like a maze. This page covers best practices, explains the different versions and encoding options, and provides concrete recipes for GDAL, DuckDB, and gpio. Please report any issues or missing pieces through the contact form so we can keep this page accurate and up-to-date, and help grow the GeoParquet community.

Want this graded and fixed automatically? GeoPQ Workbench, our free desktop app, scores any GeoParquet file against these exact best practices and rewrites it in one click — no GDAL, no server, macOS / Windows / Linux.

Tools and setup

Four tools cover most GeoParquet workflows. Choose the one you feel the most comfortable with or the one already there in your pipeline. Of course you can use Apache Arrow Python bindings for low-level parquet operations but it's not the purpose of this cookbook so we'll forget about it for now on.

GDAL

The default choice for existing pipelines built on GDAL / OGR. The Parquet driver writes GeoParquet 1.1 by default from 3.9 onwards, reads any version via /vsicurl/. Version cutoffs: 3.8 added 1.0 support, 3.9 added 1.1 (bbox + CRS), 3.12 adds 2.0 via USE_PARQUET_GEO_TYPES=YES.

brew install gdal              # macOS
sudo apt install gdal-bin       # Ubuntu / Debian
gdalinfo --version             # expect 3.9 or newer

DuckDB

The modern query engine: SQL directly against remote GeoParquet over HTTP, joins across URLs, predicate pushdown, and a growing spatial function library. Also the most ergonomic way to read PostGIS and write GeoParquet in one step.

brew install duckdb                       # macOS
curl https://install.duckdb.org | sh       # Linux
duckdb --version                           # expect 1.5.2 or newer

# inside DuckDB, once per session:
INSTALL spatial; LOAD spatial;
INSTALL httpfs; LOAD httpfs;

`gpio`: Swiss-army knife

geoparquet-io is the Swiss-army knife for GeoParquet: convert, validate, inspect, partition, upgrade. Opinionated defaults (Hilbert sort, ZSTD, bbox column, sensible row groups). If you do not care about individual flags, this is the fastest path to a spec-compliant file. Recommended as the default unless you need a specific GDAL or DuckDB feature it does not expose.

uv tool install geoparquet-io   # recommended
# or
pip install geoparquet-io

gpio --version
gpio convert --help
gpio describe --help

GeoPandas: use indirectly

The original Python GeoParquet implementation. It works, but its default is still GeoParquet 1.0 (no bbox column, no CRS metadata), and it trails the spec by a cycle or two. Do not use gdf.to_parquet() directly in production. Instead, export to GeoPackage from GeoPandas, then run gpio convert on the GPKG. You keep the GeoPandas ergonomics for analysis and get a spec-compliant, opinionated output file:

# Python: write GeoPackage from your GeoDataFrame
import geopandas as gpd
gdf.to_file('scratch.gpkg', layer='data', driver='GPKG')

# Shell: convert to GeoParquet with sensible defaults
gpio convert scratch.gpkg out.parquet

Best practices: making queries fast

Writing a valid GeoParquet file is easy. Writing one that a query engine can filter cheaply takes five pieces that work together:

Spatial sort. Features ordered by a space-filling curve (Hilbert) or at least by bbox, so row-group bboxes are tight and prune well on ST_Intersects queries.
Right-sized row groups. 50 k to 100 k rows is the sweet spot for spatial queries.
Bbox column (GeoParquet 1.1) or Parquet-native geometry statistics (2.0), so engines can prune row groups by bounding box before parsing any geometries. (Within surviving row groups every feature is still parsed and tested; neither version carries a per-row spatial index yet.) See versions and encoding.
ZSTD compression. Better ratio than Parquet's default SNAPPY, with comparable decode speed. Wins on any network-bound read (so: every cloud-native scenario). Rarely comes with the default settings.
Attribute or grid partitioning. Hive-style directories (state=MA/, or a coarse H3/S2 cell like h3_r3=83f5.../) so queries filtering on the partition key skip whole files before opening them.

The asymmetry between tools: gpio applies all four by default, which is why it is the opinionated path. GDAL and DuckDB need explicit opt-ins for each. GeoPandas leaves most of this to the caller.

Best practice	`gpio`	GDAL (`ogr2ogr`)	DuckDB
Spatial sort	default (Hilbert)	`-lco SORT_BY_BBOX=YES`	manual `ORDER BY ST_Hilbert(geom, bounds)`
Row group size	data-driven	`-lco ROW_GROUP_SIZE=100000`	`ROW_GROUP_SIZE 100_000`
Bbox column	default	`WRITE_COVERING_BBOX=AUTO` (default in 3.9+)	manual columns, or `GEOPARQUET_VERSION 'V2'`
ZSTD compression	default	`-lco COMPRESSION=ZSTD`	`COMPRESSION zstd`
Attribute partitioning	`--partition-by state`	scripted loop (one call per value)	`PARTITION_BY (state)`

Rule of thumb: unless you have a specific reason to customize or an existing pipeline, use gpio. It bakes in the four best practices above and tracks the ecosystem as it evolves. Reach for the GDAL or DuckDB recipes when you need a flag gpio does not expose, or when you are already deep in those toolchains.

Which version: 1.0, 1.1, or 2.0?

GeoParquet has three versions in the wild, and different tools write different defaults. Worth knowing which one you are actually producing, because clients downstream may or may not read them.

Version	What it has	Status
`1.0`	Parquet with a WKB geometry column. No bbox column, no CRS metadata.	Minimum viable. Readable by everything, but spatial queries are full scans.
`1.1`	WKB geometry + bbox column + CRS metadata + `geo` metadata key.	Official OGC spec. The safe default for publishing today.
`2.0`	Native Parquet geometry type (no WKB conversion on read), CRS, bbox.	Delivers the format's full potential: Parquet-native coordinate statistics, zero-copy reads into DuckDB and Arrow clients, no separate bbox column needed. Still no spatial index inside the file.

Tool support and what each writes by default:

Tool	1.0	1.1	2.0
GDAL 3.8	default	n/a	n/a
GDAL 3.9+	`WRITE_COVERING_BBOX=NO`	default	`USE_PARQUET_GEO_TYPES=YES` (≥3.12)
DuckDB (Parquet writer)	default	n/a	`GEOPARQUET_VERSION 'V2'`
`gpio`	n/a	default	`--geoparquet-version 2.0`
GeoPandas	default	`schema_version='1.1.0'`	n/a

Practical advice: write 2.0 whenever you control the reader. It is the first version that delivers the format's full potential. Fall back to 1.1 for publication to a wide, mixed audience. 1.0 essentially never, except for clients stuck on pre-2024 tooling.

Encoding: how geometries are stored

The encoding field in GeoParquet metadata can point to three very different storage strategies. Worth understanding because it determines read speed, interoperability, and whether you need a separate bbox column.

WKB (GeoParquet 1.0 and 1.1 default). Each geometry is a binary blob in a single BYTE_ARRAY column, identical to what ST_AsBinary returns in PostGIS. Universally portable, but opaque to Parquet: no row-group statistics on coordinates, which is why 1.1 adds a companion bbox column.
Native GeoArrow (GeoParquet 1.1, opt-in via GEOMETRY_ENCODING=GEOARROW in GDAL 3.9+). Raw coordinates as columnar fields. A Point column becomes struct<x, y>. A MultiPolygon becomes list<list<list<struct<x, y>>>> (multipolygon → polygons → rings → points). Parquet computes min/max directly on x and y, so statistics work without a bbox column. Faster reads, but one geometry type per column: you cannot mix types.
Parquet Geometry / Geography logical types (GeoParquet 2.0, libarrow 21+). A single geometry column, but Parquet itself knows it holds geometry and computes coordinate statistics natively. Zero-copy into Arrow clients, with the portability of a single WKB-like column. This is what GEOPARQUET_VERSION 'V2' in DuckDB and USE_PARQUET_GEO_TYPES=YES in GDAL 3.12+ produce.

Rule of thumb: reach for the Parquet Geometry logical types (2.0) when your toolchain supports libarrow 21+. Drop down to WKB (1.0 / 1.1) only for compatibility with older clients. Native GeoArrow is a middle ground: fast columnar reads, but locks each column to one geometry type.

Row group sizing

A Parquet file is a sequence of row groups, each carrying per-column min/max statistics in its metadata. On a spatial predicate like ST_Intersects, a query engine reads those statistics first and skips any row group whose bbox does not overlap the query geometry. Only the remaining groups are decompressed and scanned.

Because GeoParquet (any version) does not yet embed a per-row spatial index, the per-row-group bbox is the only cheap mechanism available during a spatial filter. It prunes whole row groups before any geometry is decoded. Within surviving row groups, the engine still parses every feature and runs the real predicate row-by-row, so row group size decides how fine-grained the cheap stage of pruning is.

Too large (~1 M rows per group): the bbox for each group covers a wide area, so queries like "features near this point" end up decompressing most of the file anyway.
Too small (under 10 k rows): metadata overhead dominates, compression ratio worsens, and the cost of reading the metadata itself starts to hurt.
Sweet spot: 50 k to 100 k rows per group, combined with spatial sorting so each group actually contains spatially close features.

Writer defaults vary. Set the option explicitly if you care:

Tool	Option	Default
GDAL	`-lco ROW_GROUP_SIZE=100000`	65 536
DuckDB	`ROW_GROUP_SIZE 100_000` in the `COPY ... TO` options list	~122 880
GeoPandas	`row_group_size=100_000`	no default (follows PyArrow)
`gpio`	`--row-group-size 100000`	data-driven

Spatial sorting

Right-sized row groups only help if features are laid out so that each group's bbox is tight. Without a spatial sort, features in insertion order leave each row group's bbox spanning the entire dataset, and the row-group pruning from the previous section collapses.

Two practical options:

Hilbert curve sort. Space-filling curve that maps 2D coordinates to a 1D index preserving spatial locality. The best known default for general spatial workloads.
Bbox min-corner sort. Simpler: sort features by (xmin, ymin) of their bounding box. Not as locally-coherent as Hilbert but close enough in practice, and much cheaper to compute. This is what GDAL's SORT_BY_BBOX=YES does.

Per-tool behavior:

gpio: Hilbert sort by default. Nothing to configure.
GDAL: -lco SORT_BY_BBOX=YES (GDAL 3.9+). Bbox-based sort, not strictly Hilbert, but effective for row-group pruning.
DuckDB: no built-in option. Sort manually in the source query with ST_Hilbert from the spatial extension:

The full "perfect" DuckDB recipe hits every best practice at once: Hilbert sort on the geometry, covering bbox struct column, ZSTD, right-sized row groups. Source here is a PostGIS table, but any input table or query result works.

INSTALL spatial; LOAD spatial;
INSTALL postgres; LOAD postgres;

ATTACH 'postgresql://user@localhost/mydb' AS pg (TYPE postgres);

-- compute the overall extent once; ST_Hilbert needs it as the reference bounds
WITH bounds AS (
  SELECT ST_Extent_Agg(geom) AS ext FROM pg.public.parcels
)
COPY (
  SELECT
    p.id, p.name, p.state_code,
    ST_AsWKB(p.geom) AS geometry,          -- GeoParquet WKB geometry column
    {                                 
      'xmin': ST_XMin(p.geom),
      'ymin': ST_YMin(p.geom),
      'xmax': ST_XMax(p.geom),
      'ymax': ST_YMax(p.geom)
    } AS bbox                              -- GeoParquet 1.1 covering bbox column
  FROM pg.public.parcels p, bounds b
  ORDER BY ST_Hilbert(p.geom, b.ext)       -- Hilbert spatial sort
) TO 'parcels.parquet' (
  FORMAT parquet,
  COMPRESSION zstd,
  ROW_GROUP_SIZE 100_000
);

What each piece does:

WITH bounds AS (... ST_Extent_Agg(geom) ...) computes the overall dataset extent once. ST_Hilbert(geom, extent) uses that reference box to produce a Hilbert curve index scalar per row.
ORDER BY ST_Hilbert(...) sorts features by spatial proximity, so consecutive rows in the output are spatially close. The Parquet writer then places them in the same row groups, which is exactly what lets row-group bbox pruning work.
The bbox struct column (xmin, ymin, xmax, ymax) matches the GeoParquet 1.1 covering bbox convention, so query engines that support it can push predicates without parsing WKB.
ROW_GROUP_SIZE 100_000 + COMPRESSION zstd are the read-performance defaults from the best practices.

Version caveat: DuckDB writes this as GeoParquet 1.0 metadata (no geo key marking it as 1.1), even though the bbox column is laid out the way 1.1 clients expect. If strict 1.1 compliance matters, pipe through gpio convert to rewrite the metadata.

Prefer the newer 2.0 logical types? Drop the bbox struct column (Parquet computes coordinate statistics natively for the geometry logical type), add GEOPARQUET_VERSION 'V2', and combine with PARTITION_BY (state_code) to Hilbert-sort within each partition in one go:

INSTALL spatial; LOAD spatial;
INSTALL postgres; LOAD postgres;

ATTACH 'postgresql://user@localhost/mydb' AS pg (TYPE postgres);

-- compute the overall extent once, same as before
WITH bounds AS (
  SELECT ST_Extent_Agg(geom) AS ext FROM pg.public.parcels
)
COPY (
  SELECT
    p.id, p.name, p.state_code,
    ST_AsWKB(p.geom) AS geometry
  FROM pg.public.parcels p, bounds b
  ORDER BY ST_Hilbert(p.geom, b.ext)
) TO 'out' (
  FORMAT parquet,
  COMPRESSION zstd,
  ROW_GROUP_SIZE 100_000,
  PARTITION_BY (state_code),
  GEOPARQUET_VERSION 'V2',
  OVERWRITE_OR_IGNORE
);

The output is a Hive-partitioned directory:

out/
  state_code=CA/data_0.parquet
  state_code=TX/data_0.parquet
  state_code=MA/data_0.parquet
  ...

Each file is a GeoParquet 2.0 file with Parquet's native geometry logical type, ZSTD-compressed, 100 k-row row groups, Hilbert-sorted within the partition. Readers filtering by state_code skip whole files at list time; readers filtering by bbox use the per-row-group coordinate statistics to prune within the partition. Two layers of pruning, no bbox column to maintain.

Compression

Parquet supports a handful of codecs. The default is SNAPPY, which was chosen for Hadoop-era workloads where CPU was the bottleneck. For modern geospatial data on S3 where network is the bottleneck, ZSTD is the right default: better ratio, comparable decompression speed, and every modern Parquet client reads it.

Codec	When to use
`ZSTD`	Default. Best ratio-to-speed trade-off.
`SNAPPY`	Parquet default; legacy Hadoop/Spark ecosystems where it is universally supported.
`GZIP`	When you need older clients that only speak GZIP.
`LZ4_RAW`	Decode-speed-critical workloads; lower ratio than ZSTD.
`BROTLI`	Archival; best ratio but slow to write.
`NONE`	Already-compressed sources (pre-JPEG imagery columns, encrypted bytes).

For a catalog that gets scanned repeatedly from the edge, ZSTD level 3 (the default) already gives you ~2-3× smaller files than SNAPPY with negligible decode penalty. gpio applies ZSTD by default.

Recipes

Concrete commands for each path. All assume recent GDAL and DuckDB; see setup on the cookbook index.

The opinionated path: `gpio`

gpio (geoparquet-io) is a Python CLI and library, built on DuckDB, GDAL, PyArrow, and obstore. It applies the four best practices above automatically: Hilbert sort, bbox column, ZSTD, sensible row groups. Fastest way to a spec-compliant output.


# convert anything OGR reads to GeoParquet 1.1 (default)
gpio convert in.shp out.parquet

# write GeoParquet 2.0 with native geometry type
gpio convert in.shp out.parquet --geoparquet-version 2.0

# with attribute partitioning (Hive-style directory)
gpio convert in.shp out/ --partition-by state

# convert and validate in one go
gpio convert in.gpkg out.parquet
gpio describe out.parquet

gpio describe prints the version, CRS, row groups, and whether a bbox column is present. Use it to sanity-check files produced by other tools as well.

From Shapefile or FileGeodatabase (with GDAL)

GDAL 3.9+ writes GeoParquet 1.1 (WKB + bbox + CRS) by default. GDAL 3.8 writes 1.0; anything older does not support the spec. The driver takes Shapefile, FGDB, GeoPackage, or anything else OGR can read.

ogr2ogr -f Parquet out.parquet in.shp \
  -lco COMPRESSION=ZSTD \
  -lco ROW_GROUP_SIZE=100000 \
  -lco GEOMETRY_ENCODING=WKB \
  -lco SORT_BY_BBOX=YES \
  -lco WRITE_COVERING_BBOX=AUTO

This turns on all five best practices the driver supports (ZSTD, 100k row groups, WKB encoding, bbox sort, bbox column). For attribute partitioning, you need to run one ogr2ogr call per partition value in a shell loop; GDAL has no single-command equivalent.

Version control: WRITE_COVERING_BBOX=AUTO (default) gives you 1.1; set NO to produce 1.0. USE_PARQUET_GEO_TYPES=YES (GDAL 3.12+, libarrow 21+) writes the new Parquet Geometry / Geography logical types, which is the GDAL-side path to GeoParquet 2.0.

From PostGIS (via DuckDB)

DuckDB's postgres extension lets you COPY a PostGIS query straight to GeoParquet, no intermediate dump on disk.

INSTALL postgres; LOAD postgres;
INSTALL spatial; LOAD spatial;

ATTACH 'postgresql://user@localhost/mydb' AS pg (TYPE postgres);

COPY (
  SELECT id, name, ST_AsWKB(geom) AS geometry
  FROM pg.public.parcels
  WHERE state_code = 'MA'
) TO 'parcels.parquet' (FORMAT parquet, COMPRESSION zstd, ROW_GROUP_SIZE 100_000);

ST_AsWKB converts the PostGIS geometry to the binary encoding GeoParquet expects. The file will round-trip cleanly through DuckDB, GeoPandas, and GDAL.

Version caveat: by default this produces GeoParquet 1.0 (WKB column, no bbox, no CRS metadata). To write 2.0 directly, add the GEOPARQUET_VERSION option:

COPY (
  SELECT id, name, ST_AsWKB(geom) AS geometry
  FROM pg.public.parcels
  WHERE state_code = 'MA'
) TO 'parcels.parquet' (FORMAT parquet, COMPRESSION zstd, ROW_GROUP_SIZE 100_000, GEOPARQUET_VERSION 'V2');

For 1.1 output (bbox + CRS, still WKB), route the 1.0 file through gpio; 1.1 is its default, so no version flag is needed:

gpio convert parcels.parquet parcels-1.1.parquet

For a properly Hilbert-sorted DuckDB output, see the spatial sorting recipe above.

Partitioning by attribute

For large multi-region datasets, partition the output into a Hive-style directory. Queries filtering on the partition key only read the relevant files.

INSTALL spatial; LOAD spatial;

COPY (
  SELECT id, state, name, ST_AsWKB(geom) AS geometry
  FROM read_parquet('in.parquet')
) TO 'out' (
  FORMAT parquet,
  PARTITION_BY (state),
  COMPRESSION zstd,
  ROW_GROUP_SIZE 100_000,
  OVERWRITE_OR_IGNORE
);

The result is a directory laid out like:

out/
  state=CA/data_0.parquet
  state=TX/data_0.parquet
  state=MA/data_0.parquet
  ...

DuckDB, GeoPandas, and GDAL can all query out/ as a single virtual dataset and will only read the partitions that match a WHERE state = 'MA' filter. This is exactly how Overture Maps and the OSM GeoParquet site on this domain lay out their data.

Grid-based partitioning. When there is no natural attribute to split on (global datasets, imagery-derived features, or any case where the obvious key is too skewed), partition by a coarse discrete global grid cell instead. H3 (Uber) and S2 (Google) are the two common choices. Compute a low-resolution cell index per feature (H3 resolution 2 or 3 gives you sub-continent cells) and use it as the partition key:

INSTALL h3 FROM community; LOAD h3;  -- DuckDB H3 community extension
INSTALL spatial; LOAD spatial;

COPY (
  SELECT
    *,
    h3_latlng_to_cell_string(
      ST_Y(ST_Centroid(geom)),
      ST_X(ST_Centroid(geom)),
      3                                -- resolution: coarser = fewer partitions
    ) AS h3_r3,
    ST_AsWKB(geom) AS geometry
  FROM read_parquet('in.parquet')
) TO 'out' (
  FORMAT parquet,
  PARTITION_BY (h3_r3),
  COMPRESSION zstd,
  ROW_GROUP_SIZE 100_000,
  OVERWRITE_OR_IGNORE
);

The output directory is structured like out/h3_r3=83f5.../data_0.parquet. Clients that know the H3 index can filter to a coarse cell before reading any data files. The same pattern works with S2 (s2_from_latlng) or a plain geohash.

The choice between attribute and grid partitioning usually comes down to the query pattern: if users always filter by an administrative key, partition by that key. If they filter by arbitrary bounding boxes or proximity, a coarse grid is the right choice. Sometimes both at once (nested: state=CA/h3_r5=.../).

Reading a GeoParquet back

Verification is the other half of the job. If you cannot read the file over HTTP from at least DuckDB, it is not published yet. The examples below run against the live OpenStreetMap GeoParquet catalog this site publishes: daily snapshots, 98 regions × 16 themes, free and keyless. Copy and paste directly. For the production-grade companion (session init file, predicate pushdown deep-dive, EXPLAIN ANALYZE with real numbers, DuckDB-WASM in the browser), see the GeoParquet Reading Cookbook.

DuckDB against a single GeoParquet URL

Count every OSM building in New York State, straight from the URL, no download:

INSTALL httpfs; LOAD httpfs;
INSTALL spatial; LOAD spatial;

SELECT COUNT(*) AS buildings
FROM read_parquet('https://parquetry.geomermaids.com/latest/country=US/state=US-NY/buildings.parquet');

With predicate pushdown and Parquet statistics, DuckDB fetches only the row groups it needs. The buildings theme carries columns like tags, levels, addr_street, addr_postcode, and state_iso; pick whatever the pipeline promoted for your filter. Confirm how many bytes actually moved:

EXPLAIN ANALYZE
SELECT COUNT(*)
FROM read_parquet('https://parquetry.geomermaids.com/latest/country=US/state=US-NY/buildings.parquet')
WHERE addr_postcode = '10001';   -- Chelsea, Manhattan

Look at the bytes_read line. A well-laid-out file returns a tiny fraction of total file size.

Catalog-wide queries via the S3 endpoint

DuckDB's httpfs cannot expand glob patterns (*) over plain HTTPS because generic HTTP has no directory-listing primitive. The geoparquet catalog exposes an anonymous S3-compatible endpoint at s3.geomermaids.com that speaks just enough of the S3 API (ListObjectsV2 + ranged GetObject) for glob expansion. Set it up once per session:

INSTALL httpfs; LOAD httpfs;
INSTALL spatial; LOAD spatial;

SET s3_endpoint = 's3.geomermaids.com';
SET s3_url_style = 'path';
SET s3_use_ssl = true;
SET s3_access_key_id = ''; SET s3_secret_access_key = '';

Now wildcards work. Count buildings per US state in one query:

SELECT state_iso, COUNT(*) AS buildings
FROM read_parquet('s3://parquetry/latest/country=US/state=*/buildings.parquet')
GROUP BY state_iso
ORDER BY buildings DESC
LIMIT 10;

state_iso is a literal column inside every file, so you group by it directly without needing Hive-style partitioning flags. See the catalog-wide queries section on geoparquet.geomermaids.com for more examples (continent-wide airports, wind turbines, etc.).

GDAL command line

ogrinfo /vsicurl/https://parquetry.geomermaids.com/latest/country=US/state=US-NY/buildings.parquet -so -al

The /vsicurl/ prefix lets GDAL CLI tools read the remote file with range requests, same mechanism as with COG. Unlike in QGIS, this path is streaming all the way down. See the next section for why QGIS itself does not share that property.

QGIS specifics

QGIS is the most common desktop client for cloud-native data, and it handles the two main formats very unevenly. The raster side (COG) streams over HTTP with range requests and is first-class. The vector side (GeoParquet) is not. Knowing the asymmetry up front saves a lot of confused users.

GeoParquet: downloads the whole file

QGIS 3.36+ reads GeoParquet from URLs, but not via range requests. It downloads the entire file to a local cache before rendering anything. For a 100 MB file that is fine. For a multi-GB world-scale dataset it is often unworkable: the whole payload crosses the wire before a single feature draws, and the user experience on a corporate VPN or a flaky connection is miserable.

The reason is that QGIS's vector rendering pipeline expects to load features into memory and build its own spatial index, rather than stream partial reads from object storage. The /vsicurl/ virtual file system that works so well for COG (and via ogrinfo / ogr2ogr on GeoParquet too) does not get the same plumbing inside QGIS for vector sources. GeoParquet 2.0's Parquet-native geometry statistics would make true streaming feasible, but the client work has not caught up yet.

Workarounds until QGIS catches up:

Filter with DuckDB first, then load the slimmer output. One SQL query against the remote URL, write a local .parquet with just the features and columns you need, open that in QGIS. Two steps but keeps the interactive part snappy.
Vector tile server in front of your GeoParquet. Martin or pg_tileserv if the source can live in PostGIS; a small Cloudflare Worker in front of static Parquet archives for cases where a live database is overkill.
DuckDB-WASM in the browser. For truly interactive exploration across multi-GB datasets without a server, paired with a STAC catalog and Deck.gl or MapLibre for the map.
Convert to GeoPackage or FlatGeobuf for the analyst workflow. GeoPackage for local editing, FlatGeobuf if you want a single file with a built-in spatial index that QGIS can read lazily over HTTP.

None are as seamless as the COG story. It is the single biggest rough edge of the cloud-native vector stack for desktop users today.

GeoParquet-specific pitfalls

Tiny Parquet row groups. Under 10k rows per group, metadata overhead eats the benefit of predicate pushdown. If your writer defaults to 1024 rows per group (old PyArrow does), set row_group_size explicitly.
Plain pandas on a GeoDataFrame. pandas.to_parquet() drops the geometry metadata. Use gdf.to_parquet() from GeoPandas with schema_version='1.1.0'.
Unsorted features with large row groups. Bbox-based pruning collapses if features are in insertion order rather than spatial order. Either SORT_BY_BBOX=YES in GDAL, manual ST_Hilbert in DuckDB, or let gpio do it for you.
DuckDB default version confusion. A GeoParquet file produced by a plain COPY ... TO ... (FORMAT parquet) is 1.0 (no bbox column, no CRS). Do not rely on downstream tools to "figure it out".

Publishing pitfalls (Content-Type, CORS, mutable filenames) are on the cookbook index, since they apply to GeoParquet and COG alike.

Working with rasters? Head to the COG cookbook. The index has shared setup, publishing, and a cross-format common-pitfalls list. The conceptual background to all of this is on the Cloud-Native Geospatial page.

geomermaids · GeoParquet Writing cookbook

Tools and setup

GDAL

DuckDB

gpio: Swiss-army knife

GeoPandas: use indirectly

Best practices: making queries fast

Which version: 1.0, 1.1, or 2.0?

Encoding: how geometries are stored

Row group sizing

Spatial sorting

Compression

Recipes

The opinionated path: gpio

From Shapefile or FileGeodatabase (with GDAL)

From PostGIS (via DuckDB)

Partitioning by attribute

Reading a GeoParquet back

DuckDB against a single GeoParquet URL

Catalog-wide queries via the S3 endpoint

GDAL command line

QGIS specifics

GeoParquet: downloads the whole file

GeoParquet-specific pitfalls

Next

`gpio`: Swiss-army knife

The opinionated path: `gpio`