PostGIS and Open Standards & Libraries
Sep-23
: In this lesson, we will gain familiarity with open standards, libraries, and databases for spatial analysis.
Software
- The powerful open-source relational database is PostgreSQL (this is installed on Middlebury’s server for you)
- You can connect to Middlebury’s database from your laptop using QGIS DB Manager Plugin, which automatically installs with QGIS.
- Optionally you may install the PGAdmin client as well, though we should be able to do everything with DB Manager.
- The spatial extension for PostgreSQL is PostGIS
- A small command-line program translates OpenStreetMap data into a PostGIS database: OSM2PGSQL (optional to install on your laptop– the plan is to use OSM2PGSQL to populate a database for you)
- It is possible to install and connect to a PostGIS server on your own computer for personal local use. This takes a bit of work and is not required for the course. See https://postgis.net/install/
Resources
Standards and libraries
Example of Relational Database Application
- Example of a large public relational database, property in Harris County, Texas parcel viewer
- Behind the scenes, each property’s attributes are drawn from multiple tables, where each .txt file is a single table
Characteristics & Benefits of Relational Databases
- Each table must have a primary key to uniquely identify every row. This may be a single field/column or a combination of them.
- Foreign keys help connect tables through joins.
- For example, an owners table will contain a property ID number as a foreign key so that the owner information can be connected to the parcel of land through a join.
- The relationship between two tables is referred to as cardinality, and may be:
- One to One: one record in table A correlates to exactly one record in table B
- One to Many: one record in table A correlates to one or more records in table B
- 1 .. 0 to N: another way of expressing that one record in table A correlates to zero, one, or many records in table B.
- The advantage of dividing data out into multiple tables with defined relations between them is in saving data storage space. Without this division, you would need one extremely wide table with many null values, e.g. it would require enough columns for information on every building in the parcel with the most buildings, but all of those columns would be null for a vacant lot!
- The advantage of using a database server like PostgreSQL is that the server can be installed in the “cloud” or in a high-powered computing center. Your data and all of the analysis for your queries exists at the server, so you can do GIS with a much simpler and cheaper laptop or PC.
- You need a client to connect to the server. For this, we can use DB Manager in QGIS or we can use an application called pgAdmin: https://www.pgadmin.org/
- With the right libraries, you can even connect to PostgreSQL databases with R / RStudio as the client!
- Finally, relational databases support indexing both attributes and geometries, vastly improving the efficiency of analyzing big data.
Example analysis
So how can you connect, e.g. information about the number of apartments on a parcel of land, when each building may have multiple apartments and each parcel may have multiple buildings? Try drawing a familiar workflow for QGIS, and then talk through translating that workflow into SQL.
- Buildings must be uniquely identified with a primary key using both the parcel number and a unique building number.
- Select only the apartment information from building fixtures.
- Group fixture data by parcel ID while summing the number of apartments.
- Now that there is only one record for each parcel with the sum of apartments, join the apartment information to parcels.
Main Page