Working With Geometries¶

In the course of working with legislative redistricting data, it is inevitable that we will have to work with files that contain geometries. Most often, these geometries come in the form of shapefiles, which, while nice in theory, can be a bit of a pain to work with in practice. For now, we will focus on the basics of working with geometries, but the interested reader is encouraged to explore our partnered library, maup, which is specifically designed to help fix tricky geometry problems. Specifically, in the event that you are working with a shapefile and run in to an error of the flavour:

UserWarning: Found overlaps among the given polygons.
Indices of overlaps: {(887, 892), (893, 915), (892, 914), (887, 893)}

or

UserWarning: Found islands (degree-0 nodes). Indices of islands: {2552, 3107}
"Found islands (degree-0 nodes). Indices of islands: {}".format(islands)

then you should consider consulting the maup documentation to see if it can help you out.

Loading and Running a Plan¶

Download MN File

For this example, we will make use of a Minnesota GeoJSON file that contains the geometries of the state’s precincts (you will need to unzip the above folder to get to the file – it’s a bit large). We will follow a similar workflow to what we already covered in the ReCom Section, but with an eye towards some of the conveniences afforded by GeographicPartition objects. As always, we’ll start with the imports:

import matplotlib.pyplot as plt
from gerrychain import (Partition, Graph, MarkovChain,
                        updaters, constraints, accept,
                        GeographicPartition)
from gerrychain.proposals import recom
from gerrychain.tree import bipartition_tree
from gerrychain.constraints import contiguous
from functools import partial
import pandas

# Set the random seed so that the results are reproducible!
import random
random.seed(42)

And now we load the graph from the GeoJSON file

graph = Graph.from_file("MN_precincts.geojson")

as well as define our updaters and initial partition

my_updaters = {
    "population": updaters.Tally("TOTPOP", alias="population"),
    "cut_edges": updaters.cut_edges,
    "perimeter": updaters.perimeter,
    "area": updaters.Tally("area", alias="area"),
}

initial_partition = GeographicPartition(
    graph,
    assignment="CONGDIST",
    updaters=my_updaters
)

The observant reader will notice that we have added two new updaters, perimeter, and area, 1 and we are now using the GeographicPartition class instead of the Partition class. The GeographicPartition class is a subclass of the Partition class that allows us the capability of working with geometries throughout our Markov chain, and the perimeter and area updaters are examples of such a geometric updater that was previously unavailable to us. These updaters are necessary for monitoring things like geometric compactness and area via metrics such as the Polsby-Popper test. 2

And now it is time for one of the first conveniences of the GeographicPartition class: we can plot our map and see the initial partition!

initial_partition.plot()

Of course, this isn’t very pretty, so let’s pass it some additional arguments to things a bit nicer:

fig, ax = plt.subplots(figsize=(8,8))
ax.set_yticks([])
ax.set_xticks([])
ax.set_title("Initial Partition in MN")
initial_partition.plot(ax=ax, cmap='tab20c')

../_images/MN_initial_partition_pretty.png

Under the hood, the plot method is using the geodataframe.plot method from geopandas to plot the geometries, and all of this is built on top of matplotlib, so most of the standard methods for modifying a matplotlib plot will work here as well.

Now that we have our initial partition, we can run a Markov chain on it just as we have previously:

ideal_population = sum(initial_partition["population"].values()) / len(initial_partition)


proposal = partial(
    recom,
    pop_col="TOTPOP",
    pop_target=ideal_population,
    epsilon=0.01,
    node_repeats=2,
)

recom_chain = MarkovChain(
    proposal=proposal,
    constraints=[contiguous],
    accept=accept.always_accept,
    initial_state=initial_partition,
    total_steps=20,
)

And the next bit of code will make a fun little widget that will allow us to watch the chain work!

%matplotlib inline
import matplotlib_inline.backend_inline
matplotlib_inline.backend_inline.set_matplotlib_formats('png')

import pandas as pd

import matplotlib.cm as mcm
import matplotlib.pyplot as plt
import networkx as nx
from PIL import Image
import io
import ipywidgets as widgets
from IPython.display import display, clear_output

frames = []
district_data = []

for i, partition in enumerate(recom_chain):
    for district_name in partition.perimeter.keys():
        population = partition.population[district_name]
        perimeter = partition.perimeter[district_name]
        area = partition.area[district_name]
        district_data.append((i, district_name, population, perimeter, area))

    buffer = io.BytesIO()
    fig, ax = plt.subplots(figsize=(10,10))
    partition.plot(ax=ax, cmap='tab20')
    ax.set_xticks([])
    ax.set_yticks([])
    plt.savefig(buffer, format='png', bbox_inches='tight')
    buffer.seek(0)
    image = Image.open(buffer)
    frames.append(image)
    plt.close(fig)

df = pd.DataFrame(
    district_data,
    columns=[
        'step',
        'district_name',
        'population',
        'perimeter',
        'area'
    ]
)

def show_frame(idx):
    clear_output(wait=True)
    display(frames[idx])

slider = widgets.IntSlider(value=0, min=0, max=len(frames)-1, step=1, description='Frame:')
slider.layout.width = '500px'
widgets.interactive(show_frame, idx=slider)

which should look something like this:

and our dataframe has collected all of the data we were interested in:

df.head(5)

	district_name	population	perimeter	area
0	8	662998.0	1.804646e+06	7.807545e+10
1	6	662979.0	6.616450e+05	7.864760e+09
2	5	662985.0	1.133867e+05	3.678103e+08
3	3	662994.0	2.625007e+05	1.509425e+09
4	7	662997.0	2.288428e+06	9.165192e+10

1: The perimeter and area attributes are actually not present in the MN_precincts.geojson file, but the GeographicPartition class will calculate them at instantiation time using the geometries provided in the file.
2: The Poslby-Popper test is a part of gerrychain’s metrics submodule as well.