PostGIS and Open Standards & Libraries

Sep-23 : In this lesson, we will gain familiarity with open standards, libraries, and databases for spatial analysis.

Digital Media

Please listen to this podcast: Mapscaping Episode 77 Spatial SQL - GIS without the GIS
Optional Blog: Why you should care about PostGIS
Optional YouTube Presentation: PostGIS Introduction (skip Q&A and 3D / 4D parts)

Software

The powerful open-source relational database is PostgreSQL (this is installed on Middlebury’s server for you)
You can connect to Middlebury’s database from your laptop using QGIS DB Manager Plugin, which automatically installs with QGIS.
Optionally you may install the PGAdmin client as well, though we should be able to do everything with DB Manager.
The spatial extension for PostgreSQL is PostGIS
A small command-line program translates OpenStreetMap data into a PostGIS database: OSM2PGSQL (optional to install on your laptop– the plan is to use OSM2PGSQL to populate a database for you)
It is possible to install and connect to a PostGIS server on your own computer for personal local use. This takes a bit of work and is not required for the course. See https://postgis.net/install/

Resources

Reference and tutorial for non-spatial PostgreSQL: Documentation, Tutorial, and cheat sheet
PostGIS reference, cheat sheet, and Introduction Tutorial
CrunchyData Interactive PostgreSQL and PostGIS Learning Portal
Paul Ramsey created PostGIS, and he has a lot of opinions about it. I’m ok with that, because it looks like he has been revamping GEOS which runs basically all vector overlay analysis in most any open source GIS software you can find. The implications are huge, because the revision he is describing would fix most of the geometry error problems that arise using overlay tools.

Standards and libraries

The Open Geospatial Consortium (OGC) created simple features standards
Geometry Engine Open Source (GEOS) is an open source code library in C++ for implementing the OGC simple feature standards
Who uses GEOS?
- PostGIS does!
- Shapely does, which is in turn used by GeoPandas for spatial data science in Python.
- Simple Features for R does, which is the basis of spatial analysis in R.
- QGIS does!

Example of Relational Database Application

Example of a large public relational database, property in Harris County, Texas parcel viewer
Behind the scenes, each property’s attributes are drawn from multiple tables, where each .txt file is a single table

Characteristics & Benefits of Relational Databases

Each table must have a primary key to uniquely identify every row. This may be a single field/column or a combination of them.
Foreign keys help connect tables through joins.
For example, an owners table will contain a property ID number as a foreign key so that the owner information can be connected to the parcel of land through a join.
The relationship between two tables is referred to as cardinality, and may be:
- One to One: one record in table A correlates to exactly one record in table B
- One to Many: one record in table A correlates to one or more records in table B
- 1 .. 0 to N: another way of expressing that one record in table A correlates to zero, one, or many records in table B.
The advantage of dividing data out into multiple tables with defined relations between them is in saving data storage space. Without this division, you would need one extremely wide table with many null values, e.g. it would require enough columns for information on every building in the parcel with the most buildings, but all of those columns would be null for a vacant lot!
The advantage of using a database server like PostgreSQL is that the server can be installed in the “cloud” or in a high-powered computing center. Your data and all of the analysis for your queries exists at the server, so you can do GIS with a much simpler and cheaper laptop or PC.
You need a client to connect to the server. For this, we can use DB Manager in QGIS or we can use an application called pgAdmin: https://www.pgadmin.org/
With the right libraries, you can even connect to PostgreSQL databases with R / RStudio as the client!
Finally, relational databases support indexing both attributes and geometries, vastly improving the efficiency of analyzing big data.

Example analysis

So how can you connect, e.g. information about the number of apartments on a parcel of land, when each building may have multiple apartments and each parcel may have multiple buildings? Try drawing a familiar workflow for QGIS, and then talk through translating that workflow into SQL.

Buildings must be uniquely identified with a primary key using both the parcel number and a unique building number.
Select only the apartment information from building fixtures.
Group fixture data by parcel ID while summing the number of apartments.
Now that there is only one record for each parcel with the sum of apartments, join the apartment information to parcels.

Main Page