Getting started with a chain

We'll start by showing a block of code that runs a simple chain, and then we'll break down what each section is doing afterwards. If you want to run this code, you can copy and paste the provided code into a file (for example, main.jl) at the root directory of the repository. Then, you can run it by navigating to the repository and running julia main.jl from the commmand line.

Code

using GerryChain

SHAPEFILE_PATH = "./PA_VTD.json"
POPULATION_COL = "TOT_POP"
ASSIGNMENT_COL = "538GOP_PL"

# Initialize graph and partition
graph = BaseGraph(SHAPEFILE_PATH, POPULATION_COL)
partition = Partition(graph, ASSIGNMENT_COL)

# Define parameters of chain (number of steps and population constraint)
pop_constraint = PopulationConstraint(graph, partition, 0.02)
num_steps = 10

# Initialize Election of interest
election = Election("SEN10", ["SEN10D", "SEN10R"], partition.num_dists)
# Define election-related metrics and scores
election_metrics = [
    vote_count("count_d", election, "SEN10D"),
    efficiency_gap("efficiency_gap", election, "SEN10D"),
    seats_won("seats_won", election, "SEN10D"),
    mean_median("mean_median", election, "SEN10D")
]
scores = [
        DistrictAggregate("presd", "PRES12D"),
        ElectionTracker(election, election_metrics)
]

# Run the chain
println("Running 10-step ReCom chain...")
chain_data = recom_chain(graph, partition, pop_constraint, num_steps, scores)

# Get values of all scores at the 10th state of the chain
score_dict = get_scores_at_step(chain_data, 10)
# Get all vote counts for each state of the chain
vote_counts_arr = get_score_values(chain_data, "count_d")

Section-by-section overview

Reading in the graph and partition

graph = BaseGraph(SHAPEFILE_PATH, POPULATION_COLL)
partition = Partition(SHAPEFILE_PATH, graph, POPULATION_COL, ASSIGNMENT_COL)

We read in the graph from a shapefile, which is a file that contains a lot of information about the various precincts in our area of interest. It contains information such as which precincts are adjacent to each other, the voting totals of each precinct in various elections, demographic characteristics of each precinct, etc. This information is stored in the graph object. Imagine the graph object as storing the "ground truth" about the precincts in an area. We pass in the name of the column in the shapefile that contains the population of each precinct as well as the name of the column that contains the initial assignment of precincts to districts.

On the other hand, the partition object is designed to do just that: contain many-to-one mapping of nodes in our graph to districts. Partition is just a fancy way of saying "assigning places to districts" - in other words, a districting plan! The Markov chain has to start somewhere, so the initial partition is the districting plan at the start of the chain. We pass in the same additional arguments (name of the population column and name of the assignment column), because we need to know (a) how many people are in each district and (b) the initial plan.

Define number of steps and population constraint

pop_constraint = PopulationConstraint(graph, partition, 0.02)
num_steps = 10

We will (eventually) pass in the population constraint to our chain as a way to ensure that the plans generated by the chain do not create districts that have unacceptable levels of population imbalance. Here, we are allowing each district to have a population that deviates from a maximum of 2% from the total population divided by the number of districts. We also declare that the number of steps that will be taken by the chain is 10. Since there are 10 steps that will be taken, we will generate 10 new plans. That means the chain is composed of a total of 11 plans - 1 initial plan and 10 generated plans.

Initialize Election of interest

election = Election("SEN10", ["SEN10D", "SEN10R"], partition.num_dists)

Oftentimes, a key part of an analysis that uses GerryChain is measuring the election outcomes across our different districting plans. We first define an Election object, which we pass a name, the columns in our shapefile that correspond to the vote counts for each party, and the number of districts in the plan.

Define scores

election_metrics = [
    vote_count("count_d", election, "SEN10D"),
    efficiency_gap("efficiency_gap", election, "SEN10D"),
    seats_won("seats_won", election, "SEN10D"),
    mean_median("mean_median", election, "SEN10D")
]
scores = [
        DistrictAggregate("presd", "PRES12D"),
        ElectionTracker(election, partisan_metrics)
]

Every time our chain generates a new proposal, we might want to run some functions that measure important aspects of the new plan. What would election outcomes be under this plan? What are the total populations of White or Black individuals in each district? What is the mean-median score of this plan? These evaluative metrics are what we call "scores." Broadly, there are two types of scores: DistrictScores and PlanScores. DistrictScores return some value for each district in the plan (for example, the difference between white and Black populations in each district). PlanScores return one value for the entire plan (such as the number of cut edges). The DistrictAggregate score defined above is a special type of DistrictScore which simply sums all of the Democratic votes in the 2012 election for each district.

In this code block, note that we start by defining a set of partisan metrics, including the efficiency gap, seats won by a particular party, and the mean-median score, which then gets passed to the ElectionTracker. An ElectionTracker is a special type of score called a CompositeScore, which runs a set of score functions that are related. In this case, grouping partisan metrics into the ElectionTracker helps the chain run more efficiently; once a new plan is generated, it re-calculates the vote counts in each district, and then all of the partisan metric scores can be run on the updated vote counts (rather than vote counts being re-calculated for every partisan metric). If no partisan metrics are passed to the ElectionTracker, then it simply keeps track of vote counts and vote shares for each party in each district.

Running the chain

println("Running 10-step ReCom chain...")
chain_data = recom_chain(graph, partition, pop_constraint, num_steps, scores)

Now that we've defined our graph, initial plan, population constraint, number of steps, and the score functions, we can run our chain! We pass these arguments to recom_chain to start a Markov chain that uses the ReCom proposal method (which you can read about here). (If you want to use the Flip proposal method instead, you can swap out recom_chain for flip_chain pretty easily, although there are some minor tweaks you would have to make).

The recom_chain method returns a ChainScoreData object which we can then query to get the value of any score(s) at a specific step of the chain (using get_scores_at_step) or all values of a particular score throughout the entire chain (using get_score_values).

Retrieving values of scores

# Get values of all scores at the 10th state of the chain
score_dict = get_scores_at_step(chain_data, 10)
# Get all vote counts for each state of the chain
vote_counts_arr = get_score_values(chain_data, "count_d")

The chain_data object returned by recom_chain can be queried to get the values of scores, as shown above.