Open Source GIScience

Joseph Holler's Open Source GIScience Resources at Middlebury College

Comparing Choropleth Maps

: In this lesson, we will learn methods to quantitatively assess Choropleth map agreement and accuracy.

Friday, April 23

Prior to Friday’s class, you should have:

Important Announcements

Research Skills

Comparing Choropleth Maps

Today, we’ll investigate the question of how to compare choropleth maps for agreement between the maps or for assessment of accuracy.

Let’s connect back to concepts in the Longley et al chapter on Uncertainty as well as E Tate’s (2013) Uncertainty Analysis for a Social Vulnerability Index

What about ordinal data, which is every choropleth map?

What is an API?

Application Program Interface! Examples:

If an API is useful for data science, it’s common to find an R package allowing you to connect to that API

Additionally, if a data course is useful for data science, it’s common to find an R package specific to loading that data source. For example:

Sample Code for Comparing Discrete Choropleth Maps

or_fig4 = # load original figure 4 data
  read_sf(here("data", "derived", "public", "georeferencing.gpkg"), 
          layer="ta_resilience") %>% 
  # load ta_resilience layer from georeferencing geopackage
  st_drop_geometry() %>%
  # remove the geometry data because two geometries cannot be joined
  select(c(ID_2,resilience)) %>%  
  # select only the ID_2 and resilience columns
  na.omit()
  # remove records with null values

rp_fig4 = ta_2010 %>% # prepare our reproduction of figure 4 data
  select(c(ID_2,capacity_2010)) %>%  
  # select only the ID_2 and resilience columns
  # note: geometry columns are 'sticky' -- only way to remove is st_drop_geometry()
  na.omit()  %>%
  # remove records with null values
  mutate(rp_res = case_when(
  capacity_2010 <= ta_brks[2] ~ 1,
  capacity_2010 <= ta_brks[3] ~ 2,
  capacity_2010 <= ta_brks[4] ~ 3,
  capacity_2010 >  ta_brks[4] ~ 4
))
# code the capacity scores as integers, as we see them classified on the map. 
#ta_brks was the result of a Jenks classification, as noted on Malcomb et al's maps

fig4compare = inner_join(rp_fig4,or_fig4,by="ID_2") %>%  
  #inner join on field ID_2 keeps only matching records
  filter(rp_res>0 & rp_res<5 & resilience > 0 & resilience < 5)
  # keep only records with valid resilience scores

table(fig4compare$resilience,fig4compare$rp_res)
# crosstabulation with frequencies

cor.test(fig4compare$resilience,fig4compare$rp_res,method="spearman")
# Spearman's Rho correlation test

fig4compare = mutate(fig4compare, difference = rp_res - resilience) 
# Calculate difference between the maps so that you can create a difference map

Sample Code for Comparing Continuous Raster Maps

orfig5vect = 
  read_sf(here("data", "derived", "public", "georeferencing.gpkg"), 
          layer="vulnerability_grid")
# load original figure 5 data

orfig5rast = st_rasterize(orfig5vect["bmean"], template=ta_final)
# convert mean of blue values into a raster using ta_final as a reference for raster
# extent, cell size, CRS, etc.

orfig5rast = orfig5rast %>% 
  mutate(or = 1-
           (bmean - min(orfig5rast[[1]], na.rm= TRUE)) /
           (max(orfig5rast[[1]], na.rm= TRUE) -
            min(orfig5rast[[1]], na.rm= TRUE)
        )
)  # or is Re-scaled from 0 to 1 with (value - min)/(max - min)
# it is also inverted, because higher blue values are less red


ta_final = ta_final %>% 
  mutate(rp =
           (capacity_2010 - min(ta_final[[1]], na.rm= TRUE)) /
           (max(ta_final[[1]], na.rm= TRUE) -
            min(ta_final[[1]], na.rm= TRUE)
        )
)  # rp is Re-scaled from 0 to 1 with (value - min)/(max - min)

fig5comp = c( select(ta_final,"rp"), select(orfig5rast,"or"))
# combine the original (or) fig 5 and reproduced (rp) fig 5

fig5comp = fig5comp %>% mutate( diff = rp - or )
# calculate difference between the original and reproduction,
# for purposes of mapping

fig5comppts = st_as_sf(fig5comp)
# convert raster to vector points to simplify plotting and correlation testing

plot(fig5comppts$or, fig5comppts$rp, xlab="Original Study", ylab="Reproduction")
# create scatterplot of original results and reproduction results

cor.test(fig5comppts$or, fig5comppts$rp, method="spearman")
# Spearman's Rho correlation test

# Hint for mapping raster results: refer to the diff raster attribute
# in the fig5comp stars object like this: fig5comp["diff"]

Finalizing the Reproduction Study

The code blocks above will help you to integrate your georeferenced results from QGIS with your final results from the study in R. Of course, you may have to modify the file names, layer names, and column names to match the data as you created it in QGIS and R.

To finalize the study, make the following figures:

You can save all of these figures as .png files into your repository’s results folder using R code. One great thing about doing cartography in R is that you can repurpose code from one map to the next!

Main Page