API Reference¶
Table of Contents
Adjacency graphs¶
-
class
gerrychain.graph.graph.
Graph
(incoming_graph_data=None, **attr)[source]¶ Represents a graph to be partitioned, extending the
networkx.Graph
.This class includes additional class methods for constructing graphs from shapefiles, and for saving and loading graphs in JSON format.
Initialize a graph with edges, name, or graph attributes.
- incoming_graph_datainput graph (optional, default: None)
Data to initialize graph. If None (default) an empty graph is created. The data can be an edge list, or any NetworkX graph object. If the corresponding optional Python packages are installed the data can also be a 2D NumPy array, a SciPy sparse matrix, or a PyGraphviz graph.
- attrkeyword arguments, optional (default= no attributes)
Attributes to add to graph as key=value pairs.
convert
>>> G = nx.Graph() # or DiGraph, MultiGraph, MultiDiGraph, etc >>> G = nx.Graph(name="my graph") >>> e = [(1, 2), (2, 3), (3, 4)] # list of edges >>> G = nx.Graph(e)
Arbitrary graph attribute pairs (key=value) may be assigned
>>> G = nx.Graph(e, day="Friday") >>> G.graph {'day': 'Friday'}
-
add_data
(df: <Mock name='mock.DataFrame' id='139842338441776'>, columns: Optional[Iterable[str]] = None) → None[source]¶ Add columns of a DataFrame to a graph as node attributes by matching the DataFrame’s index to node ids.
- Parameters
df (
pandas.DataFrame
) – Dataframe containing given columns.columns (Optional[Iterable[str]], optional) – List of dataframe column names to add. Default is None.
- Returns
None
-
classmethod
from_file
(filename: str, adjacency: str = 'rook', cols_to_add: Optional[List[str]] = None, reproject: bool = False, ignore_errors: bool = False) → gerrychain.graph.graph.Graph[source]¶ Create a
Graph
from a shapefile (or GeoPackage, or GeoJSON, or any other library thatgeopandas
can read. Seefrom_geodataframe()
for more details.- Parameters
filename (str) – Path to the shapefile / GeoPackage / GeoJSON / etc.
adjacency (str, optional) – The adjacency type to use (“rook” or “queen”). Default is “rook”
cols_to_add (Optional[List[str]], optional) – The names of the columns that you want to add to the graph as node attributes. Default is None.
reproject (bool, optional) – Whether to reproject to a UTM projection before creating the graph. Default is False.
ignore_errors (bool, optional) – Whether to ignore all invalid geometries and try to continue creating the graph. Default is False.
- Returns
The Graph object of the geometries from filename.
- Return type
Warning
This method requires the optional
geopandas
dependency. So please installgerrychain
with thegeo
extra via the command:pip install gerrychain[geo]
or install
geopandas
separately.
-
classmethod
from_geodataframe
(dataframe: <Mock name='mock.DataFrame' id='139842338441776'>, adjacency: str = 'rook', cols_to_add: Optional[List[str]] = None, reproject: bool = False, ignore_errors: bool = False, crs_override: Union[str, int, None] = None) → gerrychain.graph.graph.Graph[source]¶ Creates the adjacency
Graph
of geometries described by dataframe. The areas of the polygons are included as node attributes (with key area). The shared perimeter of neighboring polygons are included as edge attributes (with key shared_perim). Nodes corresponding to polygons on the boundary of the union of all the geometries (e.g., the state, if your dataframe describes VTDs) have a boundary_node attribute (set to True) and a boundary_perim attribute with the length of this “exterior” boundary.By default, areas and lengths are computed in a UTM projection suitable for the geometries. This prevents the bizarro area and perimeter values that show up when you accidentally do computations in Longitude-Latitude coordinates. If the user specifies reproject=False, then the areas and lengths will be computed in the GeoDataFrame’s current coordinate reference system. This option is for users who have a preferred CRS they would like to use.
- Parameters
dataframe (
geopandas.GeoDataFrame
) – The GeoDateFrame to convertadjacency (str, optional) – The adjacency type to use (“rook” or “queen”). Default is “rook”.
cols_to_add (Optional[List[str]], optional) – The names of the columns that you want to add to the graph as node attributes. Default is None.
reproject (bool, optional) – Whether to reproject to a UTM projection before creating the graph. Default is
False
.ignore_errors (bool, optional) – Whether to ignore all invalid geometries and attept to create the graph anyway. Default is
False
.crs_override (Optional[Union[str,int]], optional) – Value to override the CRS of the GeoDataFrame. Default is None.
- Returns
The adjacency graph of the geometries from dataframe.
- Return type
-
classmethod
from_json
(json_file: str) → gerrychain.graph.graph.Graph[source]¶ Load a graph from a JSON file in the NetworkX json_graph format.
- Parameters
json_file (str) – Path to JSON file.
- Returns
The loaded graph as an instance of this class.
- Return type
-
classmethod
from_networkx
(graph: networkx.classes.graph.Graph) → gerrychain.graph.graph.Graph[source]¶ Create a Graph instance from a networkx.Graph object.
- Parameters
graph (networkx.Graph) – The networkx graph to be converted.
- Returns
The converted graph as an instance of this class.
- Return type
-
property
islands
¶ - Returns
The set of degree-0 nodes.
- Return type
Set
-
issue_warnings
() → None[source]¶ - Returns
None
- Raises
UserWarning if the graph has any red flags (right now, only islands).
-
join
(dataframe: <Mock name='mock.DataFrame' id='139842338441776'>, columns: Optional[List[str]] = None, left_index: Optional[str] = None, right_index: Optional[str] = None) → None[source]¶ Add data from a dataframe to the graph, matching nodes to rows when the node’s left_index attribute equals the row’s right_index value.
- Parameters
dataframe (
pandas.DataFrame
) – DataFrame.- Columns
The columns whose data you wish to add to the graph. If not provided, all columns are added. Default is None.
- Left_index
The node attribute used to match nodes to rows. If not provided, node IDs are used. Default is None.
- Right_index
The DataFrame column name to use to match rows to nodes. If not provided, the DataFrame’s index is used. Default is None.
- Returns
None
-
lookup
(node: Any, field: Any) → Any[source]¶ Lookup a node/field attribute.
- Parameters
node (Any) – Node to look up.
field (Any) – Field to look up.
- Returns
The value of the attribute field at node.
- Return type
Any
-
to_json
(json_file: str, *, include_geometries_as_geojson: bool = False) → None[source]¶ Save a graph to a JSON file in the NetworkX json_graph format.
- Parameters
json_file (str) – Path to target JSON file.
include_geometry_as_geojson (bool) – Whether to include any
shapely
geometry objects encountered in the graph’s node attributes as GeoJSON. The default (False
) behavior is to remove all geometry objects because they are not serializable. Including the GeoJSON will result in a much larger JSON file.
- Returns
None
Partitions¶
-
class
gerrychain.partition.
Partition
(graph=None, assignment=None, updaters=None, parent=None, flips=None, use_default_updaters=True)[source]¶ Bases:
object
Partition represents a partition of the nodes of the graph. It will perform the first layer of computations at each step in the Markov chain - basic aggregations and calculations that we want to optimize.
- Variables
graph – The underlying graph.
assignment – Maps node IDs to district IDs.
parts – Maps district IDs to the set of nodes in that district.
subgraphs – Maps district IDs to the induced subgraph of that district.
- Parameters
graph (
Graph
) – Underlying graph.assignment (
Assignment
) – Dictionary assigning nodes to districts.updaters – Dictionary of functions to track data about the partition. The keys are stored as attributes on the partition class, which the functions compute.
use_default_updaters – If False, do not include default updaters.
-
crosses_parts
(edge: Tuple) → bool[source]¶ - Parameters
edge (Tuple) – tuple of node IDs
- Returns
True if the edge crosses from one part of the partition to another
- Return type
bool
-
flip
(flips: Dict) → gerrychain.partition.partition.Partition[source]¶ Returns the new partition obtained by performing the given flips on this partition.
-
classmethod
from_districtr_file
(graph: gerrychain.graph.graph.Graph, districtr_file: str, updaters: Optional[Dict[str, Callable]] = None) → gerrychain.partition.partition.Partition[source]¶ Create a Partition from a districting plan created with Districtr, a free and open-source web app created by MGGG for drawing districts.
The provided
graph
should be created from the same shapefile as the Districtr module used to draw the districting plan. These shapefiles may be found in a repository in the mggg-states GitHub organization, or by request from MGGG.- Parameters
graph (
Graph
) – The graph to create the Partition fromdistrictr_file (str) – the path to the
.json
file exported from Districtrupdaters (Optional[Dict[str, Callable]], optional) – dictionary of updaters
- Returns
The partition created from the Districtr file
- Return type
-
classmethod
from_random_assignment
(graph: gerrychain.graph.graph.Graph, n_parts: int, epsilon: float, pop_col: str, updaters: Optional[Dict[str, Callable]] = None, use_default_updaters: bool = True, flips: Optional[Dict] = None, method: Callable = <function recursive_tree_part>) → gerrychain.partition.partition.Partition[source]¶ Create a Partition with a random assignment of nodes to districts.
- Parameters
graph (
Graph
) – The graph to create the Partition from.n_parts (int) – The number of districts to divide the nodes into.
epsilon (float population. Should be in [0,1]) – The maximum relative population deviation from the ideal
pop_col (str) – The column of the graph’s node data that holds the population data.
updaters (Optional[Dict[str, Callable]], optional) – Dictionary of updaters
use_default_updaters (bool, optional) – If False, do not include default updaters.
flips (Optional[Dict], optional) – Dictionary assigning nodes of the graph to their new districts.
method (Callable, optional) – The function to use to partition the graph into
n_parts
. Defaults torecursive_tree_part()
.
- Returns
The partition created with a random assignment
- Return type
-
plot
(geometries=None, **kwargs)[source]¶ Plot the partition, using the provided geometries.
- Parameters
geometries (geopandas.GeoDataFrame or geopandas.GeoSeries) – A
geopandas.GeoDataFrame
orgeopandas.GeoSeries
holding the geometries to use for plotting. ItsIndex
should match the node labels of the partition’s underlyingGraph
.**kwargs – Additional arguments to pass to
geopandas.GeoDataFrame.plot()
to adjust the plot.
- Returns
The matplotlib axes object. Which plots the Partition.
- Return type
matplotlib.axes.Axes
-
class
gerrychain.partition.
GeographicPartition
(graph=None, assignment=None, updaters=None, parent=None, flips=None, use_default_updaters=True)[source]¶ Bases:
gerrychain.partition.partition.Partition
A
Partition
with areas, perimeters, and boundary information included. These additional data allow you to compute compactness scores like Polsby-Popper.- Parameters
graph – Underlying graph.
assignment – Dictionary assigning nodes to districts.
updaters – Dictionary of functions to track data about the partition. The keys are stored as attributes on the partition class, which the functions compute.
use_default_updaters – If False, do not include default updaters.
Markov chains¶
-
class
gerrychain.
MarkovChain
(proposal: Callable, constraints: Union[Iterable[Callable], gerrychain.constraints.validity.Validator, Iterable[gerrychain.constraints.bounds.Bounds], Callable], accept: Callable, initial_state: gerrychain.partition.partition.Partition, total_steps: int)[source]¶ MarkovChain is a class that creates an iterator for iterating over the states of a Markov chain run in a gerrymandering analysis context.
It allows for the generation of a sequence of partitions (states) of a political districting plan, where each partition represents a possible state in the Markov chain.
Example usage:
chain = MarkovChain(proposal, constraints, accept, initial_state, total_steps) for state in chain: # Do whatever you want - print output, compute scores, ...
- Parameters
proposal (Callable) – Function proposing the next state from the current state.
constraints (Union[Iterable[Callable], Validator, Iterable[Bounds], Callable]) – A function with signature
Partition -> bool
determining whether the proposed next state is valid (passes all binary constraints). Usually this is aValidator
class instance.accept (Callable) – Function accepting or rejecting the proposed state. In the most basic use case, this always returns
True
. But if the user wanted to use a Metropolis-Hastings acceptance rule, this is where you would implement it.initial_state (Partition) – Initial
gerrychain.partition.Partition
class.total_steps (int) – Number of steps to run.
- Returns
None
- Raises
ValueError – If the initial_state is not valid according to the constraints.
-
property
constraints
¶ Read_only alias for the is_valid property. Returns the constraints of the Markov chain.
- Returns
The constraints of the Markov chain.
- Return type
String
Proposals¶
-
gerrychain.proposals.
recom
(partition: gerrychain.partition.partition.Partition, pop_col: str, pop_target: Union[int, float], epsilon: float, node_repeats: int = 1, region_surcharge: Optional[Dict] = None, method: Callable = <function bipartition_tree>) → gerrychain.partition.partition.Partition[source]¶ ReCom (short for ReCombination) is a Markov Chain Monte Carlo (MCMC) algorithm used for redistricting. At each step of the algorithm, a pair of adjacent districts is selected at random and merged into a single district. The region is then split into two new districts by generating a spanning tree using the Kruskal/Karger algorithm and cutting an edge at random. The edge is checked to ensure that it separates the region into two new districts that are population balanced, and, if not, a new edge is selected at random and the process is repeated.
Example usage:
from functools import partial from gerrychain import MarkovChain from gerrychain.proposals import recom # ...define constraints, accept, partition, total_steps here... # Ideal population: pop_target = sum(partition["population"].values()) / len(partition) proposal = partial( recom, pop_col="POP10", pop_target=pop_target, epsilon=.05, node_repeats=10 ) chain = MarkovChain(proposal, constraints, accept, partition, total_steps)
- Parameters
partition (Partition) – The initial partition.
pop_col (str) – The name of the population column.
pop_target (Union[int,float]) – The target population for each district.
epsilon (float) – The epsilon value for population deviation as a percentage of the target population.
node_repeats (int, optional) – The number of times to repeat the bipartitioning step. Default is 1.
region_surcharge (Optional[Dict], optional) – The surcharge dictionary for the graph used for region-aware partitioning of the grid. Default is None.
method (Callable, optional) – The method used for bipartitioning the tree. Default is
bipartition_tree()
.
- Returns
The new partition resulting from the ReCom algorithm.
- Return type
-
gerrychain.proposals.
reversible_recom
(partition: gerrychain.partition.partition.Partition, pop_col: str, pop_target: Union[int, float], epsilon: float, balance_edge_fn: Callable = <function find_balanced_edge_cuts_memoization>, M: int = 1, repeat_until_valid: bool = False, choice: Callable = <bound method Random.choice of <random.Random object>>) → gerrychain.partition.partition.Partition[source]¶ Reversible ReCom algorithm for redistricting.
This function performs the reversible ReCom algorithm, which is a Markov Chain Monte Carlo (MCMC) algorithm used for redistricting. For more information, see the paper “Spanning Tree Methods for Sampling Graph Partitions” by Cannon, et al. (2022) at https://arxiv.org/abs/2210.01401
- Parameters
partition (Partition) – The initial partition.
pop_col (str) – The name of the population column.
pop_target (Union[int,float]) – The target population for each district.
epsilon (float) – The epsilon value for population deviation as a percentage of the target population.
balance_edge_fn (Callable, optional) – The balance edge function. Default is find_balanced_edge_cuts_memoization.
M (int, optional) – The maximum number of balance edges. Default is 1.
repeat_until_valid (bool, optional) – Flag indicating whether to repeat until a valid partition is found. Default is False.
choice (Callable, optional) – The choice function for selecting a random element. Default is random.choice.
- Returns
The new partition resulting from the reversible ReCom algorithm.
- Return type
-
gerrychain.proposals.
spectral_recom
(partition: gerrychain.partition.partition.Partition, weight_type: Optional[str] = None, lap_type: str = 'normalized') → gerrychain.partition.partition.Partition[source]¶ Spectral ReCom proposal.
Uses spectral clustering to bipartition a subgraph of the original graph formed by merging the nodes corresponding to two adjacent districts.
Example usage:
from functools import partial from gerrychain import MarkovChain from gerrychain.proposals import recom # ...define constraints, accept, partition, total_steps here... proposal = partial( spectral_recom, weight_type=None, lap_type="normalized" ) chain = MarkovChain(proposal, constraints, accept, partition, total_steps)
- Parameters
partition (Partition) – The initial partition.
weight_type (Optional[str], optional) – The type of weight to be used in the Laplacian. Default is None.
lap_type (str, optional) – The type of Laplacian to be used. Default is “normalized”.
- Returns
The new partition resulting from the spectral ReCom algorithm.
- Return type
-
gerrychain.proposals.
propose_chunk_flip
(partition: gerrychain.partition.partition.Partition) → gerrychain.partition.partition.Partition[source]¶ Chooses a random boundary node and proposes to flip it and all of its neighbors
Binary constraints¶
The gerrychain.constraints
module provides a collection of constraint
functions and helper classes for the validation step in GerryChain.
Helper classes |
|
---|---|
Collection of constraints |
|
Bounds on numeric constraints |
|
Upper bounds on numeric constraints |
|
Lower bounds on numeric constraints |
|
Automatic upper bounds on numeric constraints |
|
Automatic lower bounds on numeric constraints |
|
Percentage bounds for numeric constraints |
Binary constraint functions |
|
---|---|
|
Lower bounded L1-reciprocal Polsby-Popper |
|
Lower bounded L(-1)-reciprocal Polsby-Popper |
Districts are contiguous (with NetworkX methods) |
|
Districts are contiguous (optimized for |
|
No districts may be completely consumed |
Each new step proposed to the chain is passed off to the “validator” functions here to determine whether or not the step is valid. If it is invalid (breaks contiguity, for instance), then the step is immediately rejected.
A validator should take in a Partition
instance,
and should return whether or not the instance is valid according to their
rules. Many top-level functions following this signature in this module are
examples of this.
-
class
gerrychain.constraints.
LowerBound
(func: Callable, bound: float)[source]¶ Wrapper for numeric-validators to enforce lower limits.
This class is meant to be called as a function after instantiation; its return is
True
if the numeric validator is within a set lower limit, andFalse
otherwise.- Parameters
func (Callable) – Numeric validator function. Should return a comparable value.
bounds (float) – Comparable lower bound.
-
class
gerrychain.constraints.
SelfConfiguringLowerBound
(func: Callable, epsilon: float = 0.05)[source]¶ Wrapper for numeric-validators to enforce automatic lower limits.
When instantiated, the initial lower bound is set as the initial value of the numeric-validator minus some configurable ε.
This class is meant to be called as a function after instantiation; its return is
True
if the numeric validator is within a set lower limit, andFalse
otherwise.- Parameters
func (Callable) – Numeric validator function.
epsilon (float, optional) – Initial population deviation allowable by the validator as a percentage of the ideal population. Defaults to 0.05.
-
class
gerrychain.constraints.
SelfConfiguringUpperBound
(func: Callable)[source]¶ Wrapper for numeric-validators to enforce automatic upper limits.
When instantiated, the initial upper bound is set as the initial value of the numeric-validator.
This class is meant to be called as a function after instantiation; its return is
True
if the numeric validator is within a set upper limit, andFalse
otherwise.- Parameters
func (Callable) – Numeric validator function.
-
class
gerrychain.constraints.
UpperBound
(func: Callable, bound: float)[source]¶ Wrapper for numeric-validators to enforce upper limits.
This class is meant to be called as a function after instantiation; its return is
True
if the numeric validator is within a set upper limit, andFalse
otherwise.- Parameters
func (Callable) – Numeric validator function. Should return a comparable value.
bounds (float) – Comparable upper bound.
-
class
gerrychain.constraints.
WithinPercentRangeOfBounds
(func: Callable, percent: float)[source]¶ Wrapper for numeric-validators to enforce upper and lower limits determined by a percentage of the initial value.
When instantiated, the initial upper and lower bounds are set as the initial value of the numeric-validator times (1 ± percent).
This class is meant to be called as a function after instantiation; its return is
True
if the numeric validator is within the desired percentage range of the initial value, andFalse
otherwise.- Parameters
func (Callable) – Numeric validator function.
percent (float) – Percentage of the initial value to use as the bounds.
- Returns
None
Warning
The percentage is assumed to be in the range [0.0, 100.0].
-
gerrychain.constraints.
L1_polsby_popper
(partition: gerrychain.partition.partition.Partition) → float[source]¶ Returns the \(L^1\) norm of the Polsby-Popper scores for the given partition
- Parameters
partition (Partition) – Partition representing a districting plan
- Returns
\(L^1\) norm of the reciprocal Polsby-Popper scores
- Return type
float
-
gerrychain.constraints.
L1_reciprocal_polsby_popper
(partition: gerrychain.partition.partition.Partition) → float[source]¶ Returns the \(L^1\) norm of the reciprocal Polsby-Popper scores for the given partition
- Parameters
partition (Partition) – Partition representing a districting plan
- Returns
\(L^1\) norm of the reciprocal Polsby-Popper scores
- Return type
float
-
gerrychain.constraints.
L2_polsby_popper
(partition: gerrychain.partition.partition.Partition) → float[source]¶ Returns the \(L^2\) norm of the Polsby-Popper scores for the given partition.
- Parameters
partition (Partition) – Partition representing a districting plan
- Returns
\(L^2\) norm of the Polsby-Popper scores
- Return type
float
-
gerrychain.constraints.
L_minus_1_polsby_popper
(partition)[source]¶ Returns the \(L^{-1}\) norm of the Polsby-Popper scores for the given partition.
- Parameters
partition (Partition) – Partition representing a districting plan
- Returns
\(L^{-1}\) norm of the Polsby-Popper scores
- Return type
float
-
gerrychain.constraints.
contiguous
(partition: gerrychain.partition.partition.Partition) → bool[source]¶ Check if the parts of a partition are connected using
networkx.is_connected()
.
-
gerrychain.constraints.
contiguous_bfs
(partition: gerrychain.partition.partition.Partition) → bool[source]¶ Checks that a given partition’s parts are connected as graphs using a simple breadth-first search.
- Parameters
partition (Partition) – Instance of Partition
- Returns
Whether the parts of this partition are connected
- Return type
bool
-
gerrychain.constraints.
single_flip_contiguous
(partition: gerrychain.partition.partition.Partition) → bool[source]¶ Check if swapping the given node from its old assignment disconnects the old assignment class.
- Parameters
- Returns
whether the partition is contiguous
- Return type
bool
We assume that removed_node belonged to an assignment class that formed a connected subgraph. To see if its removal left the subgraph connected, we check that the neighbors of the removed node are still connected through the changed graph.
-
class
gerrychain.constraints.
Validator
(constraints: List[Callable])[source]¶ A single callable for checking that a partition passes a collection of constraints. Intended to be passed as the
is_valid
parameter when instantiatingMarkovChain
.This class is meant to be called as a function after instantiation; its return is
True
if all validators pass, andFalse
if any one fails.Example usage:
is_valid = Validator([constraint1, constraint2, constraint3]) chain = MarkovChain(proposal, is_valid, accept, initial_state, total_steps)
- Variables
constraints – List of validator functions that will check partitions.
- Parameters
constraints (List[Callable]) – List of validator functions that will check partitions.
-
gerrychain.constraints.
deviation_from_ideal
(partition: gerrychain.partition.partition.Partition, attribute: str = 'population') → Dict[int, float][source]¶ Computes the deviation of the given
attribute
from exact equality among parts of the partition. Usuallyattribute
is the population, and this function is used to compute how far a districting plan is from exact population equality.By “deviation” we mean
(actual_value - ideal)/ideal
(not the absolute value).
-
gerrychain.constraints.
districts_within_tolerance
(partition: gerrychain.partition.partition.Partition, attribute_name: str = 'population', percentage: float = 0.1) → bool[source]¶ Check if all districts are within a certain percentage of the “smallest” district, as defined by the given attribute.
- Parameters
partition (Partition) – Partition class instance
attrName (str, optional) – String that is the name of an updater in partition. Default is
"population"
.percentage (float, optional) – What percent (as a number between 0 and 1) difference is allowed. Default is 0.1.
- Returns
Whether the districts are within specified tolerance
- Return type
bool
-
gerrychain.constraints.
no_vanishing_districts
(partition: gerrychain.partition.partition.Partition) → bool[source]¶ Require that no districts be completely consumed.
- Parameters
partition (Partition) – Partition to check.
- Returns
Whether no districts are completely consumed.
- Return type
bool
-
gerrychain.constraints.
refuse_new_splits
(partition_county_field: str) → Callable[[gerrychain.partition.partition.Partition], bool][source]¶ Refuse all proposals that split a county that was previous unsplit.
- Parameters
partition_county_field (str) – Name of field for county information generated by
county_splits()
.- Returns
Function that returns
True
if the proposal does not split any new counties.- Return type
Callable[[Partition], bool]
-
gerrychain.constraints.
within_percent_of_ideal_population
(initial_partition: gerrychain.partition.partition.Partition, percent: float = 0.01, pop_key: str = 'population') → gerrychain.constraints.bounds.Bounds[source]¶ Require that all districts are within a certain percent of “ideal” (i.e., uniform) population.
Ideal population is defined as “total population / number of districts.”
- Parameters
- Returns
A
Bounds
constraint on the population attribute identified bypop_key
.- Return type
-
class
gerrychain.constraints.
Bounds
(func: Callable, bounds: Tuple[float, float])[source]¶ Wrapper for numeric-validators to enforce upper and lower limits.
This class is meant to be called as a function after instantiation; its return is
True
if the numeric validator is within set limits, andFalse
otherwise.- Parameters
func (Callable) – Numeric validator function. Should return an iterable of values.
bounds (Tuple[float, float]) – Tuple of (lower, upper) numeric bounds.
Updaters¶
-
gerrychain.updaters.
flows_from_changes
[source]¶ - Parameters
- Returns
A dictionary mapping each node that changed assignment between the previous and current partitions to a dictionary of the form {‘in’: <set of nodes that flowed in>, ‘out’: <set of nodes that flowed out>}.
- Return type
Dict
-
gerrychain.updaters.
county_splits
(partition_name: str, county_field_name: str) → Callable[source]¶ Update that allows for the tracking of county splits.
- Parameters
partition_name (str) – Name that the
Partition
instance will store.county_field_name (str) – Name of county ID field on the graph.
- Returns
The tracked data is a dictionary keyed on the county ID. The stored values are tuples of the form (split, nodes, seen). split is a
CountySplit
enum, nodes is a list of node IDs, and seen is a list of assignment IDs that are contained in the county.- Return type
Callable
-
gerrychain.updaters.
cut_edges
(partition)[source]¶ - Parameters
partition (
Partition
) – A partition of a Graph- Returns
The set of edges that are cut by the given partition.
- Return type
Set[Tuple]
-
gerrychain.updaters.
cut_edges_by_part
(partition, previous: Set[Tuple], new_edges: Set[Tuple], old_edges: Set[Tuple]) → Set[Tuple][source]¶ Updater function that responds to the flow of edges between different partitions.
- Parameters
partition (
Partition
) – A partition of a Graphprevious (Set[Tuple]) – The previous set of edges for a fixed part of the given partition.
new_edges (Set[Tuple]) – The set of edges that have flowed into the given part of the partition.
old_edges (Set[Tuple]) – The set of cut edges in the previous partition.
- Returns
The new set of cut edges for the newly generated partition.
- Return type
Set
-
class
gerrychain.updaters.
Tally
(fields: Union[str, List[str]], alias: Optional[str] = None, dtype: Type = <class 'int'>)[source]¶ An updater for keeping a tally of one or more node attributes.
- Variables
fields – The list of node attributes that you want to tally. Or just a single attribute name as a string.
alias – The aliased name of this Tally (meaning, the key corresponding to this Tally in the Partition’s updaters dictionary)
dtype – The type (int, float, etc.) that you want the tally to have
- Parameters
fields (Union[str, List[str]]) – The list of node attributes that you want to tally. Or a just a single attribute name as a string.
alias (Optional[str], optional) – The aliased name of this Tally (meaning, the key corresponding to this Tally in the Partition’s updaters dictionary). Default is None.
dtype (Any, optional) – The type (int, float, etc.) that you want the tally to have. Default is int.
- Returns
None
-
class
gerrychain.updaters.
DataTally
(data: Union[Dict, <Mock name='mock.Series' id='139842338441968'>, str], alias: str)[source]¶ An updater for tallying numerical data that is not necessarily stored as node attributes
- Variables
data – A Dict or Series indexed by the graph’s nodes, or the string key for a node attribute containing the Tally’s data.
alias – The name of the tally in the Partition’s updaters dictionary
- Parameters
data (Union[Dict, pandas.Series, str]) – A Dict or Series indexed by the graph’s nodes, or the string key for a node attribute containing the Tally’s data.
alias (str) – The name of the tally in the Partition’s updaters dictionary
- Returns
None
-
gerrychain.updaters.
boundary_nodes
(partition, alias: str = 'boundary_nodes') → Set[source]¶ - Parameters
partition (
Partition
) – A partition of a Graphalias (str, optional) – The name of the attribute that the boundary nodes are stored under. Default is ‘boundary_nodes’.
- Returns
The set of nodes in the partition that are on the boundary.
- Return type
Set
-
gerrychain.updaters.
flips
(partition) → Dict[source]¶ - Parameters
partition (
Partition
) – A partition of a Graph- Returns
The flips that were made to get from the parent partition to the given partition.
- Return type
Dict
-
gerrychain.updaters.
perimeter
(partition) → Dict[int, float][source]¶ - Parameters
partition (
Partition
) – A partition of a Graph- Returns
A dictionary mapping each part of a partition to its perimeter.
- Return type
Dict[int, float]
-
gerrychain.updaters.
exterior_boundaries
(partition, previous: Set, inflow: Set, outflow: Set) → Dict[source]¶ Updater function that responds to the flow of nodes between different partitions.
- Parameters
partition (
Partition
) – A partition of a Graphprevious (Set) – The previous set of exterior boundary nodes for a fixed part of the given partition.
inflow (Set) – The set of nodes that have flowed into the given part of the partition.
outflow (Set) – The set of nodes that have flowed out of the given part of the partition.
- Returns
A dict mapping each part of the partition to the new exterior boundary of that part.
- Return type
Dict
-
gerrychain.updaters.
interior_boundaries
(partition, previous: Set, new_edges: Set, old_edges: Set) → Dict[source]¶ Updater function that responds to the flow of nodes between different partitions.
- Parameters
partition (
Partition
) – A partition of a Graphprevious (Set) – The previous set of exterior boundary nodes for a fixed part of the given partition.
new_edges (Set) – The set of edges that have flowed into the given part of the partition.
old_edges (Set) – The set of edges that have flowed out of the given part of the partition.
- Returns
A dict mapping each part of the partition to the new interior boundary of that part.
- Return type
Dict
-
gerrychain.updaters.
exterior_boundaries_as_a_set
(partition, previous: Set, inflow: Set, outflow: Set) → Set[source]¶ Updater function that responds to the flow of nodes between different partitions.
- Parameters
partition (
Partition
) – A partition of a Graphprevious (Set) – The previous set of exterior boundary nodes for a fixed part of the given partition.
inflow (Set) – The set of nodes that have flowed into the given part of the partition.
outflow (Set) – The set of nodes that have flowed out of the given part of the partition.
- Returns
The new set of exterior boundary nodes for the given part of the partition.
- Return type
Set
-
class
gerrychain.updaters.
CountySplit
[source]¶ Enum to track county splits in a partition.
- Variables
NOT_SPLIT – The county is not split.
NEW_SPLIT – The county is split in the current partition.
OLD_SPLIT – The county is split in the parent partition.
-
gerrychain.updaters.
compute_edge_flows
(partition) → Dict[source]¶ - Parameters
partition (
Partition
) – A partition of a Graph- Returns
A flow dictionary containing the flow from the parent of this partition to this partition. This dictionary is of the form {part: {‘in’: <set of edges that flowed in>, ‘out’: <set of edges that flowed out>}}.
- Return type
Dict
-
class
gerrychain.updaters.
Election
(name: str, parties_to_columns: Union[Dict, List], alias: Optional[str] = None)[source]¶ Represents the data of one election, with races conducted in each part of the partition.
As we vary the districting plan, we can use the same node-level vote totals to tabulate hypothetical elections. To do this manually with tallies, we would have to maintain tallies for each party, as well as the total number of votes, and then compute the electoral results and percentages from scratch every time. To make this simpler, this class provides an
ElectionUpdater
to manage these tallies. The updater returns anElectionResults
class giving a convenient view of the election results, with methods likewins()
orpercent()
for common queries the user might make on election results.Example usage:
# Assuming your nodes have attributes "2008_D", "2008_R" # with (for example) 2008 senate election vote totals election = Election( "2008 Senate", {"Democratic": "2008_D", "Republican": "2008_R"}, alias="2008_Sen" ) # Assuming you already have a graph and assignment: partition = Partition( graph, assignment, updaters={"2008_Sen": election} ) # The updater returns an ElectionResults instance, which # we can use (for example) to see how many seats a given # party would win in this partition using this election's # vote distribution: partition["2008_Sen"].wins("Republican")
- Variables
name – The name of the election. (e.g. “2008 Presidential”)
parties – A list of the names of the parties in the election.
columns – A list of the columns in the graph’s node data that hold the vote totals for each party.
parties_to_columns – A dictionary mapping party names to the columns in the graph’s node data that hold the vote totals for that party.
tallies – A dictionary mapping party names to
DataTally
objects that manage the vote totals for that party.updater – An
ElectionUpdater
object that manages the tallies and returns anElectionResults
object.alias – The name that the election is registered under in the partition’s dictionary of updaters.
- Parameters
name (str) – The name of the election. (e.g. “2008 Presidential”)
parties_to_columns (Union[Dict, List]) – A dictionary matching party names to their data columns, either as actual columns (list-like, indexed by nodes) or as string keys for the node attributes that hold the party’s vote totals. Or, a list of strings which will serve as both the party names and the node attribute keys.
alias (Optional[str], optional) – Alias that the election is registered under in the Partition’s dictionary of updaters.
-
gerrychain.updaters.
num_spanning_trees
(partition) → Dict[int, int][source]¶ - Returns
The number of spanning trees in each part (district) of a partition.
- Return type
Dict[int, int]
-
gerrychain.updaters.
tally_region_splits
(reg_attr_lst)[source]¶ A naive updater for tallying the number of times a region attribute is split. for each region attribute in reg_attr_lst.
- Parameters
reg_attr_lst (List[str]) – A list of region names to tally splits for.
- Returns
A function that takes a partition and returns a dictionary which maps the region name to the number of times that it is split in a a particular partition.
- Return type
Callable
Elections¶
-
class
gerrychain.updaters.election.
Election
(name: str, parties_to_columns: Union[Dict, List], alias: Optional[str] = None)[source]¶ Represents the data of one election, with races conducted in each part of the partition.
As we vary the districting plan, we can use the same node-level vote totals to tabulate hypothetical elections. To do this manually with tallies, we would have to maintain tallies for each party, as well as the total number of votes, and then compute the electoral results and percentages from scratch every time. To make this simpler, this class provides an
ElectionUpdater
to manage these tallies. The updater returns anElectionResults
class giving a convenient view of the election results, with methods likewins()
orpercent()
for common queries the user might make on election results.Example usage:
# Assuming your nodes have attributes "2008_D", "2008_R" # with (for example) 2008 senate election vote totals election = Election( "2008 Senate", {"Democratic": "2008_D", "Republican": "2008_R"}, alias="2008_Sen" ) # Assuming you already have a graph and assignment: partition = Partition( graph, assignment, updaters={"2008_Sen": election} ) # The updater returns an ElectionResults instance, which # we can use (for example) to see how many seats a given # party would win in this partition using this election's # vote distribution: partition["2008_Sen"].wins("Republican")
- Variables
name – The name of the election. (e.g. “2008 Presidential”)
parties – A list of the names of the parties in the election.
columns – A list of the columns in the graph’s node data that hold the vote totals for each party.
parties_to_columns – A dictionary mapping party names to the columns in the graph’s node data that hold the vote totals for that party.
tallies – A dictionary mapping party names to
DataTally
objects that manage the vote totals for that party.updater – An
ElectionUpdater
object that manages the tallies and returns anElectionResults
object.alias – The name that the election is registered under in the partition’s dictionary of updaters.
- Parameters
name (str) – The name of the election. (e.g. “2008 Presidential”)
parties_to_columns (Union[Dict, List]) – A dictionary matching party names to their data columns, either as actual columns (list-like, indexed by nodes) or as string keys for the node attributes that hold the party’s vote totals. Or, a list of strings which will serve as both the party names and the node attribute keys.
alias (Optional[str], optional) – Alias that the election is registered under in the Partition’s dictionary of updaters.
-
class
gerrychain.updaters.election.
ElectionResults
(election: gerrychain.updaters.election.Election, counts: Dict[str, Dict[int, float]], regions: List[int])[source]¶ Represents the results of an election. Provides helpful methods to answer common questions you might have about an election (Who won? How many seats?, etc.).
- Variables
election – The
Election
object that these results are associated with.totals_for_party – A dictionary mapping party names to the total number of votes that party received in each part of the partition.
regions – A list of regions that we would like the results for.
totals – A dictionary mapping each part of the partition to the total number of votes cast in that part.
percents_for_party – A dictionary mapping party names to the percentage of votes that party received in each part of the partition.
Note
The variable “regions” is generally called “parts” in other sections of the codebase, but we have changed it here to avoid confusion with the parameter “party” that often appears within the class.
- Parameters
- Counts
A dictionary mapping party names to the total number of votes that party received in each part of the partition.
- Returns
None
-
count
(party: str, region: Optional[str] = None) → int[source]¶ - Parameters
party (str) – Party ID.
region (Optional[int], optional) – ID of the part of the partition whose votes we want to tally.
- Returns
The total number of votes that
party
received in a given region (part of the partition). Ifregion
is omitted, returns the overall vote total ofparty
.- Return type
int
-
counts
(party: str) → Tuple[source]¶ - Parameters
party (str) – Party ID
- Returns
tuple of the total votes cast for
party
in each part of the partition- Return type
Tuple
-
efficiency_gap
() → float[source]¶ Computes the efficiency gap for this ElectionResults object.
See:
efficiency_gap()
- Returns
The efficiency gap for this election.
- Return type
float
-
mean_median
() → float[source]¶ Computes the mean-median score for this ElectionResults object.
See:
mean_median()
- Returns
The mean-median score for this election.
- Return type
float
-
mean_thirdian
() → float[source]¶ Computes the mean-thirdian score for this ElectionResults object.
See:
mean_thirdian()
- Returns
The mean-thirdian score for this election.
- Return type
float
-
partisan_bias
() → float[source]¶ Computes the partisan bias for this ElectionResults object.
See:
partisan_bias()
- Returns
The partisan bias for this election.
- Return type
float
-
partisan_gini
() → float[source]¶ Computes the Gini score for this ElectionResults object.
See:
partisan_gini()
- Returns
The partisan Gini score for this election.
- Return type
float
-
percent
(party: str, region: Optional[int] = None) → float[source]¶ - Parameters
party (str) – Party ID.
region (Optional[int], optional) – ID of the part of the partition whose votes we want to tally.
- Returns
The percentage of the vote that
party
received in a given region (part of the partition). Ifregion
is omitted, returns the overall vote share ofparty
.- Return type
float
-
percents
(party: str) → Tuple[source]¶ - Parameters
party (str) – Party ID
- Returns
The tuple of the percentage of votes that
party
received in each part of the partition- Return type
Tuple
-
seats
(party: str) → int[source]¶ - Parameters
party (str) – Party name
- Returns
The number of seats that
party
won.- Return type
int
-
total_votes
() → int[source]¶ - Returns
The total number of votes cast in the election.
- Return type
int
-
votes
(party: str) → Tuple[source]¶ An alias for
counts()
.- Parameters
party (str) – Party ID
- Returns
tuple of the total votes cast for
party
in each part of the partition- Return type
Tuple
-
class
gerrychain.updaters.election.
ElectionUpdater
(election: gerrychain.updaters.election.Election)[source]¶ The updater for computing the election results in each part of the partition after each step in the Markov chain. The actual results are returned to the user as an
ElectionResults
instance.-
get_previous_values
(partition) → Dict[str, Dict[int, float]][source]¶ - Parameters
partition (
Partition
) – The partition whose parent we want to obtain the previous vote totals from.- Returns
A dictionary mapping party names to the vote totals that party received in each part of the parent of the current partition.
- Return type
Dict[str, Dict[int, float]]
-
-
gerrychain.updaters.election.
format_part_results
(percents_for_party: Dict[str, Dict[int, float]], part: int) → str[source]¶ - Parameters
percents_for_party (Dict[str, Dict[int, float]]) – A dictionary mapping party names to a dict containing the percentage of votes that party received in each part of the partition.
part (int) – The part of the partition whose results we want to format.
- Returns
A formatted string containing the results for the given part of the partition.
- Return type
str
-
gerrychain.updaters.election.
get_percents
(counts: Dict, totals: Dict) → Dict[source]¶ - Parameters
counts (Dict) – A dictionary mapping each part in a partition to the count of the number of votes that a party received in that part.
totals (Dict) – A dictionary mapping each part in a partition to the total number of votes cast in that part.
- Returns
A dictionary mapping each part in a partition to the percentage
- Return type
Dict
Grids¶
To make it easier to play around with GerryChain, we have provided a Grid
class representing a partition of a grid graph. This is especially useful if you want to start experimenting
but do not yet have a clean set of data and geometries to build your graph from.
-
class
gerrychain.grid.
Grid
(dimensions: Optional[Tuple[int, int]] = None, with_diagonals: bool = False, assignment: Optional[Dict] = None, updaters: Optional[Dict[str, Callable]] = None, parent: Optional[Grid] = None, flips: Optional[Dict[Tuple[int, int], int]] = None)[source]¶ The
Grid
class represents a grid partitioned into districts. It is useful for running little experiments with GerryChain without needing to do any data processing or cleaning to get started.Example usage:
grid = Grid((10,10))
The nodes of
grid.graph
are labelled by tuples(i,j)
, for0 <= i <= 10
and0 <= j <= 10
. Each node has anarea
of 1 and each edge hasshared_perim
1.If the updaters are not specified, the default updaters are used, which are as follows:
default_updaters = { "cut_edges": cut_edges, "population": Tally("population"), "perimeter": perimeter, "exterior_boundaries": exterior_boundaries, "interior_boundaries": interior_boundaries, "boundary_nodes": boundary_nodes, "area": Tally("area", alias="area"), "polsby_popper": polsby_popper, "cut_edges_by_part": cut_edges_by_part, }
- Parameters
dimensions (Tuple[int, int], optional) – The grid dimensions (rows, columns), defaults to None.
with_diagonals (bool, optional) – If True, includes diagonal connections, defaults to False.
assignment (Dict, optional) – Node-to-district assignments, defaults to None.
updaters (Dict[str, Callable], optional) – Custom updater functions, defaults to None.
parent (Grid, optional) – Parent Grid object for inheritance, defaults to None.
flips (Dict[Tuple[int, int], int], optional) – Node flips for partition changes, defaults to None.
- Raises
Exception – If neither dimensions nor parent is provided.
Spanning tree methods¶
The recom()
proposal function operates on spanning trees of the
adjacency graph in order to generate new contiguous districting plans with balanced population.
The gerrychain.tree
submodule exposes some helpful functions for partitioning graphs
using spanning trees methods. These may be used to implement proposal functions or to generate
initial plans for running Markov chains, as described in MGGG’s 2018 Virginia House of Delegates
report.
This module provides tools and algorithms for manipulating and analyzing graphs, particularly focused on partitioning graphs based on population data. It leverages the NetworkX library to handle graph structures and implements various algorithms for graph partitioning and tree traversal.
Key functionalities include:
Predecessor and successor functions for graph traversal using breadth-first search.
Implementation of random and uniform spanning trees for graph partitioning.
The PopulatedGraph class, which represents a graph with additional population data, and methods for assessing and modifying this data.
Functions for finding balanced edge cuts in a populated graph, either through contraction or memoization techniques.
A suite of functions (bipartition_tree, recursive_tree_part, get_seed_chunks, etc.) for partitioning graphs into balanced subsets based on population targets and tolerances.
Utility functions like get_max_prime_factor_less_than and recursive_seed_part_inner to assist in complex partitioning tasks.
Dependencies:
networkx: Used for graph data structure and algorithms.
random: Provides random number generation for probabilistic approaches.
typing: Used for type hints.
Last Updated: 25 April 2024
-
exception
gerrychain.tree.
BipartitionWarning
[source]¶ Generally raised when it is proving difficult to find a balanced cut.
-
class
gerrychain.tree.
Cut
¶ Represents a cut in a graph.
Create new instance of Cut(edge, weight, subset)
-
edge
¶ The edge where the cut is made. Defaults to None.
-
subset
¶ The (frozen) subset of nodes on one side of the cut. Defaults to None.
-
weight
¶ The weight assigned to the edge (if any). Defaults to None.
-
-
class
gerrychain.tree.
PopulatedGraph
(graph: networkx.classes.graph.Graph, populations: Dict, ideal_pop: Union[float, int], epsilon: float)[source]¶ A class representing a graph with population information.
- Variables
graph – The underlying graph structure.
subsets – A dictionary mapping nodes to their subsets.
population – A dictionary mapping nodes to their populations.
tot_pop – The total population of the graph.
ideal_pop – The ideal population for each district.
epsilon – The tolerance for population deviation from the ideal population within each district.
- Parameters
graph (nx.Graph) – The underlying graph structure.
populations (Dict) – A dictionary mapping nodes to their populations.
ideal_pop (Union[float, int]) – The ideal population for each district.
epsilon (float) – The tolerance for population deviation as a percentage of the ideal population within each district.
-
has_ideal_population
(node, one_sided_cut: bool = False) → bool[source]¶ Checks if a node has an ideal population within the graph up to epsilon.
- Parameters
node (Any) – The node to check.
one_sided_cut (bool, optional) – Whether or not we are cutting off a single district. When set to False, we check if the node we are cutting and the remaining graph are both within epsilon of the ideal population. When set to True, we only check if the node we are cutting is within epsilon of the ideal population. Defaults to False.
- Returns
True if the node has an ideal population within the graph up to epsilon.
- Return type
bool
-
exception
gerrychain.tree.
PopulationBalanceError
[source]¶ Raised when the population of a district is outside the acceptable epsilon range.
-
exception
gerrychain.tree.
ReselectException
[source]¶ Raised when the tree-splitting algorithm is unable to find a balanced cut after some maximum number of attempts, but the user has allowed the algorithm to reselect the pair of districts from parent graph to try and recombine.
-
gerrychain.tree.
bipartition_tree
(graph: networkx.classes.graph.Graph, pop_col: str, pop_target: Union[int, float], epsilon: float, node_repeats: int = 1, spanning_tree: Optional[networkx.classes.graph.Graph] = None, spanning_tree_fn: Callable = <function random_spanning_tree>, region_surcharge: Optional[Dict] = None, balance_edge_fn: Callable = <function find_balanced_edge_cuts_memoization>, one_sided_cut: bool = False, choice: Callable = <bound method Random.choice of <random.Random object>>, max_attempts: Optional[int] = 100000, warn_attempts: int = 1000, allow_pair_reselection: bool = False, cut_choice: Callable = <function _region_preferred_max_weight_choice>) → Set[source]¶ This function finds a balanced 2 partition of a graph by drawing a spanning tree and finding an edge to cut that leaves at most an epsilon imbalance between the populations of the parts. If a root fails, new roots are tried until node_repeats in which case a new tree is drawn.
Builds up a connected subgraph with a connected complement whose population is
epsilon * pop_target
away frompop_target
.- Parameters
graph (nx.Graph) – The graph to partition.
pop_col (str) – The node attribute holding the population of each node.
pop_target (Union[int, float]) – The target population for the returned subset of nodes.
epsilon (float) – The allowable deviation from
pop_target
(as a percentage ofpop_target
) for the subgraph’s population.node_repeats (int, optional) – A parameter for the algorithm: how many different choices of root to use before drawing a new spanning tree. Defaults to 1.
spanning_tree (Optional[nx.Graph], optional) – The spanning tree for the algorithm to use (used when the algorithm chooses a new root and for testing).
spanning_tree_fn (Callable, optional) – The random spanning tree algorithm to use if a spanning tree is not provided. Defaults to
random_spanning_tree()
.region_surcharge (Optional[Dict], optional) – A dictionary of surcharges for the spanning tree algorithm. Defaults to None.
balance_edge_fn (Callable, optional) – The function to find balanced edge cuts. Defaults to
find_balanced_edge_cuts_memoization()
.one_sided_cut (bool, optional) – Passed to the
balance_edge_fn
. Determines whether or not we are cutting off a single district when partitioning the tree. When set to False, we check if the node we are cutting and the remaining graph are both within epsilon of the ideal population. When set to True, we only check if the node we are cutting is within epsilon of the ideal population. Defaults to False.choice (Callable, optional) – The function to make a random choice of root node for the population tree. Passed to
balance_edge_fn
. Can be substituted for testing. Defaults torandom.random()
.max_attempts (Optional[int], optional) – The maximum number of attempts that should be made to bipartition. Defaults to 10000.
warn_attempts (int, optional) – The number of attempts after which a warning is issued if a balanced cut cannot be found. Defaults to 1000.
allow_pair_reselection (bool, optional) – Whether we would like to return an error to the calling function to ask it to reselect the pair of nodes to try and recombine. Defaults to False.
cut_choice (Callable, optional) – The function used to select the cut edge from the list of possible balanced cuts. Defaults to
_region_preferred_max_weight_choice()
.
- Returns
A subset of nodes of
graph
(whose induced subgraph is connected). The other part of the partition is the complement of this subset.- Return type
Set
- Raises
BipartitionWarning – If a possible cut cannot be found after 1000 attempts.
RuntimeError – If a possible cut cannot be found after the maximum number of attempts given by
max_attempts
.
-
gerrychain.tree.
bipartition_tree_random
(graph: networkx.classes.graph.Graph, pop_col: str, pop_target: Union[int, float], epsilon: float, node_repeats: int = 1, repeat_until_valid: bool = True, spanning_tree: Optional[networkx.classes.graph.Graph] = None, spanning_tree_fn: Callable = <function random_spanning_tree>, balance_edge_fn: Callable = <function find_balanced_edge_cuts_memoization>, one_sided_cut: bool = False, choice: Callable = <bound method Random.choice of <random.Random object>>, max_attempts: Optional[int] = 100000) → Optional[Set[Any]][source]¶ This is like
bipartition_tree()
except it chooses a random balanced cut, rather than the first cut it finds.This function finds a balanced 2 partition of a graph by drawing a spanning tree and finding an edge to cut that leaves at most an epsilon imbalance between the populations of the parts. If a root fails, new roots are tried until node_repeats in which case a new tree is drawn.
Builds up a connected subgraph with a connected complement whose population is
epsilon * pop_target
away frompop_target
.- Parameters
graph (nx.Graph) – The graph to partition.
pop_col (str) – The node attribute holding the population of each node.
pop_target (Union[int, float]) – The target population for the returned subset of nodes.
epsilon (float) – The allowable deviation from
pop_target
(as a percentage ofpop_target
) for the subgraph’s population.node_repeats (int) – A parameter for the algorithm: how many different choices of root to use before drawing a new spanning tree. Defaults to 1.
repeat_until_valid (bool, optional) – Determines whether to keep drawing spanning trees until a tree with a balanced cut is found. If True, a set of nodes will always be returned; if False, None will be returned if a valid spanning tree is not found on the first try. Defaults to True.
spanning_tree (Optional[nx.Graph], optional) – The spanning tree for the algorithm to use (used when the algorithm chooses a new root and for testing). Defaults to None.
spanning_tree_fn (Callable, optional) – The random spanning tree algorithm to use if a spanning tree is not provided. Defaults to
random_spanning_tree()
.balance_edge_fn (Callable, optional) – The algorithm used to find balanced cut edges. Defaults to
find_balanced_edge_cuts_memoization()
.one_sided_cut (bool, optional) – Passed to the
balance_edge_fn
. Determines whether or not we are cutting off a single district when partitioning the tree. When set to False, we check if the node we are cutting and the remaining graph are both within epsilon of the ideal population. When set to True, we only check if the node we are cutting is within epsilon of the ideal population. Defaults to False.choice (Callable, optional) – The random choice function. Can be substituted for testing. Defaults to
random.choice()
.max_attempts (Optional[int], optional) – The max number of attempts that should be made to bipartition. Defaults to None.
- Returns
A subset of nodes of
graph
(whose induced subgraph is connected) or None if a valid spanning tree is not found.- Return type
Union[Set[Any], None]
-
gerrychain.tree.
epsilon_tree_bipartition
(graph: networkx.classes.graph.Graph, parts: Sequence, pop_target: Union[float, int], pop_col: str, epsilon: float, node_repeats: int = 1, method: Callable = functools.partial(<function bipartition_tree>, max_attempts=10000)) → Dict[source]¶ Uses
bipartition_tree()
to partition a tree into two parts of populationpop_target
(withinepsilon
).- param graph
The graph to partition into two :math:`
- arepsilon`-balanced parts.
- type graph
nx.Graph
- param parts
Iterable of part (district) labels (like
[0,1,2]
orrange(4)
).- type parts
Sequence
- param pop_target
Target population for each part of the partition.
- type pop_target
Union[float, int]
- param pop_col
Node attribute key holding population data.
- type pop_col
str
- param epsilon
How far (as a percentage of
pop_target
) frompop_target
the parts of the partition can be.- type epsilon
float
- param node_repeats
Parameter for
bipartition_tree()
to use. Defaults to 1.- type node_repeats
int, optional
- param method
The partition method to use. Defaults to partial(bipartition_tree, max_attempts=10000).
- type method
Callable, optional
- returns
New assignments for the nodes of
graph
.- rtype
dict
-
gerrychain.tree.
find_balanced_edge_cuts_contraction
(h: gerrychain.tree.PopulatedGraph, one_sided_cut: bool = False, choice: Callable = <bound method Random.choice of <random.Random object>>) → List[gerrychain.tree.Cut][source]¶ Find balanced edge cuts using contraction.
- Parameters
h (PopulatedGraph) – The populated graph.
one_sided_cut (bool, optional) – Whether or not we are cutting off a single district. When set to False, we check if the node we are cutting and the remaining graph are both within epsilon of the ideal population. When set to True, we only check if the node we are cutting is within epsilon of the ideal population. Defaults to False.
choice (Callable, optional) – The function used to make random choices.
- Returns
A list of balanced edge cuts.
- Return type
List[Cut]
-
gerrychain.tree.
find_balanced_edge_cuts_memoization
(h: gerrychain.tree.PopulatedGraph, one_sided_cut: bool = False, choice: Callable = <bound method Random.choice of <random.Random object>>) → List[gerrychain.tree.Cut][source]¶ Find balanced edge cuts using memoization.
This function takes a PopulatedGraph object and a choice function as input and returns a list of balanced edge cuts. A balanced edge cut is defined as a cut that divides the graph into two subsets, such that the population of each subset is close to the ideal population defined by the PopulatedGraph object.
- Parameters
h (PopulatedGraph) – The PopulatedGraph object representing the graph.
one_sided_cut (bool, optional) – Whether or not we are cutting off a single district. When set to False, we check if the node we are cutting and the remaining graph are both within epsilon of the ideal population. When set to True, we only check if the node we are cutting is within epsilon of the ideal population. Defaults to False.
choice (Callable, optional) – The choice function used to select the root node.
- Returns
A list of balanced edge cuts.
- Return type
List[Cut]
-
gerrychain.tree.
get_max_prime_factor_less_than
(n: int, ceil: int) → Optional[int][source]¶ Helper function for recursive_seed_part_inner. Returns the largest prime factor of
n
less thanceil
, or None if all are greater than ceil.- Parameters
n (int) – The number to find the largest prime factor for.
ceil (int) – The upper limit for the largest prime factor.
- Returns
The largest prime factor of
n
less thanceil
, or None if all are greater than ceil.- Return type
Optional[int]
-
gerrychain.tree.
get_seed_chunks
(graph: networkx.classes.graph.Graph, num_chunks: int, num_dists: int, pop_target: Union[int, float], pop_col: str, epsilon: float, node_repeats: int = 1, method: Callable = functools.partial(<function bipartition_tree_random>, max_attempts=10000)) → List[List[int]][source]¶ Helper function for recursive_seed_part. Partitions the graph into
num_chunks
chunks, balanced within new_epsilon <=epsilon
of a balanced target population.- Parameters
graph (nx.Graph) – The graph
num_chunks (int) – The number of chunks to partition the graph into
num_dists (int) – The number of districts
pop_target (Union[int, float]) – The target population of the districts (not of the chunks)
pop_col (str) – Node attribute key holding population data
epsilon (float) – How far (as a percentage of
pop_target
) frompop_target
the parts of the partition can benode_repeats (int, optional) – Parameter for
bipartition_tree_random()
to use. Defaults to 1.method (Callable, optional) – The method to use for bipartitioning the graph. Defaults to
bipartition_tree_random()
- Returns
New assignments for the nodes of
graph
.- Return type
List[List[int]]
-
gerrychain.tree.
random_spanning_tree
(graph: networkx.classes.graph.Graph, region_surcharge: Optional[Dict] = None) → networkx.classes.graph.Graph[source]¶ Builds a spanning tree chosen by Kruskal’s method using random weights.
- Parameters
graph (nx.Graph) – The input graph to build the spanning tree from. Should be a Networkx Graph.
region_surcharge (Optional[Dict], optional) – Dictionary of surcharges to add to the random weights used in region-aware variants.
- Returns
The maximal spanning tree represented as a Networkx Graph.
- Return type
nx.Graph
-
gerrychain.tree.
recursive_seed_part
(graph: networkx.classes.graph.Graph, parts: Sequence, pop_target: Union[float, int], pop_col: str, epsilon: float, method: Callable = functools.partial(<function bipartition_tree>, max_attempts=10000), node_repeats: int = 1, n: Optional[int] = None, ceil: Optional[int] = None) → Dict[source]¶ Returns a partition with
num_dists
districts balanced withinepsilon
ofpop_target
by recursively splitting graph using recursive_seed_part_inner.- Parameters
graph (nx.Graph) – The graph
parts (Sequence) – Iterable of part labels (like
[0,1,2]
orrange(4)
pop_target (Union[float, int]) – Target population for each part of the partition
pop_col (str) – Node attribute key holding population data
epsilon (float) – How far (as a percentage of
pop_target
) frompop_target
the parts of the partition can bemethod (Callable, optional) – Function used to find balanced partitions at the 2-district level Defaults to
bipartition_tree()
node_repeats (int, optional) – Parameter for
bipartition_tree()
to use. Defaults to 1.n (Optional[int], optional) – Either a positive integer (greater than 1) or None. If n is a positive integer, this function will recursively create a seed plan by either biting off districts from graph or dividing graph into n chunks and recursing into each of these. If n is None, this function prime factors ``num_dists``=n_1*n_2*…*n_k (n_1 > n_2 > … n_k) and recursively partitions graph into n_1 chunks. Defaults to None.
ceil (Optional[int], optional) – Either a positive integer (at least 2) or None. Relevant only if n is None. If
ceil
is a positive integer then finds the largest factor ofnum_dists
less than or equal toceil
, and recursively splits graph into that number of chunks, or bites off a district if that number is 1. Defaults to None.
- Returns
New assignments for the nodes of
graph
.- Return type
dict
-
gerrychain.tree.
recursive_seed_part_inner
(graph: networkx.classes.graph.Graph, num_dists: int, pop_target: Union[float, int], pop_col: str, epsilon: float, method: Callable = functools.partial(<function bipartition_tree>, max_attempts=10000), node_repeats: int = 1, n: Optional[int] = None, ceil: Optional[int] = None) → List[Set][source]¶ Inner function for recursive_seed_part. Returns a partition with
num_dists
districts balanced withinepsilon
ofpop_target
. Splits graph into num_chunks chunks, and then recursively splits each chunk intonum_dists
/num_chunks chunks. The number num_chunks of chunks is chosen based onn
andceil
as follows:If
n
is None, andceil
is None, num_chunks is the largest prime factor ofnum_dists
.If
n
is None andceil
is an integer at least 2, then num_chunks is the largest prime factor ofnum_dists
that is less thanceil
If
n
is a positive integer, num_chunks equals n.
Finally, if the number of chunks as chosen above does not divide
num_dists
, then this function bites off a single district from the graph and recursively partitions the remaining graph intonum_dists - 1
districts.- Parameters
graph (nx.Graph) – The underlying graph structure.
num_dists (int) – number of districts to partition the graph into
pop_target (Union[float, int]) – Target population for each part of the partition
pop_col (str) – Node attribute key holding population data
epsilon (float) – How far (as a percentage of
pop_target
) frompop_target
the parts of the partition can bemethod (Callable, optional) – Function used to find balanced partitions at the 2-district level. Defaults to
bipartition_tree()
node_repeats (int, optional) – Parameter for
bipartition_tree()
to use. Defaults to 1.n (Optional[int], optional) – Either a positive integer (greater than 1) or None. If n is a positive integer, this function will recursively create a seed plan by either biting off districts from graph or dividing graph into n chunks and recursing into each of these. If n is None, this function prime factors ``num_dists``=n_1*n_2*…*n_k (n_1 > n_2 > … n_k) and recursively partitions graph into n_1 chunks. Defaults to None.
ceil (Optional[int], optional) – Either a positive integer (at least 2) or None. Relevant only if n is None. If
ceil
is a positive integer then finds the largest factor ofnum_dists
less than or equal toceil
, and recursively splits graph into that number of chunks, or bites off a district if that number is 1. Defaults to None.
- Returns
New assignments for the nodes of
graph
.- Return type
List of sets, each set is a district
-
gerrychain.tree.
recursive_tree_part
(graph: networkx.classes.graph.Graph, parts: Sequence, pop_target: Union[float, int], pop_col: str, epsilon: float, node_repeats: int = 1, method: Callable = functools.partial(<function bipartition_tree>, max_attempts=10000)) → Dict[source]¶ Uses
bipartition_tree()
recursively to partition a tree intolen(parts)
parts of populationpop_target
(withinepsilon
). Can be used to generate initial seed plans or to implement ReCom-like “merge walk” proposals.- param graph
The graph to partition into
len(parts)
:math:`
- arepsilon`-balanced parts.
- type graph
nx.Graph
- param parts
Iterable of part (district) labels (like
[0,1,2]
orrange(4)
).- type parts
Sequence
- param pop_target
Target population for each part of the partition.
- type pop_target
Union[float, int]
- param pop_col
Node attribute key holding population data.
- type pop_col
str
- param epsilon
How far (as a percentage of
pop_target
) frompop_target
the parts of the partition can be.- type epsilon
float
- param node_repeats
Parameter for
bipartition_tree()
to use. Defaluts to 1.- type node_repeats
int, optional
- param method
The partition method to use. Defaults to partial(bipartition_tree, max_attempts=10000).
- type method
Callable, optional
- returns
New assignments for the nodes of
graph
.- rtype
dict
-
gerrychain.tree.
uniform_spanning_tree
(graph: networkx.classes.graph.Graph, choice: Callable = <bound method Random.choice of <random.Random object>>) → networkx.classes.graph.Graph[source]¶ Builds a spanning tree chosen uniformly from the space of all spanning trees of the graph. Uses Wilson’s algorithm.
- Parameters
graph (nx.Graph) – Networkx Graph
choice (Callable, optional) –
random.choice()
. Defaults torandom.choice()
.
- Returns
A spanning tree of the graph chosen uniformly at random.
- Return type
nx.Graph
Metrics¶
-
gerrychain.metrics.
mean_median
(election_results) → float[source]¶ Computes the Mean-Median score for the given ElectionResults. A positive value indicates an advantage for the first party listed in the Election’s parties_to_columns dictionary.
- Parameters
election_results (ElectionResults) – An ElectionResults object
- Returns
The Mean-Median score for the given ElectionResults
- Return type
float
-
gerrychain.metrics.
partisan_bias
(election_results) → float[source]¶ Computes the partisan bias for the given ElectionResults. The partisan bias is defined as the number of districts with above-mean vote share by the first party divided by the total number of districts, minus 1/2.
- Parameters
election_results (ElectionResults) – An ElectionResults object
- Returns
The partisan bias for the given ElectionResults
- Return type
float
-
gerrychain.metrics.
partisan_gini
(election_results) → float[source]¶ Computes the partisan Gini score for the given ElectionResults. The partisan Gini score is defined as the area between the seats-votes curve and its reflection about (.5, .5).
For more information on the computation, see Definition 1 in: https://arxiv.org/pdf/2008.06930.pdf
- Parameters
election_results (ElectionResults) – An ElectionResults object
- Returns
The partisan Gini score for the given ElectionResults
- Return type
float
-
gerrychain.metrics.
efficiency_gap
(election_results) → float[source]¶ Computes the efficiency gap for the given ElectionResults. A positive value indicates an advantage for the first party listed in the Election’s parties_to_columns dictionary.
- Parameters
election_results (ElectionResults) – An ElectionResults object
- Returns
The efficiency gap for the given ElectionResults
- Return type
float
-
gerrychain.metrics.
polsby_popper
(partition) → Dict[int, float][source]¶ Computes Polsby-Popper compactness scores for each district in the partition.
- Parameters
partition (Partition) – The partition to compute scores for
- Returns
A dictionary mapping each district ID to its Polsby-Popper score
- Return type
Dict[int, float]
-
gerrychain.metrics.
wasted_votes
(party1_votes: int, party2_votes: int) → Tuple[int, int][source]¶ Computes the wasted votes for each party in the given race. :param party1_votes: the number of votes party1 received in the race :type party1_votes: int :param party2_votes: the number of votes party2 received in the race :type party2_votes: int
- Returns
a tuple of the wasted votes for each party
- Return type
Tuple[int, int]
Diversity stats¶
Simple tooling to collect diversity stats on chain runs
-
class
gerrychain.meta.diversity.
DiversityStats
(unique_plans: int, unique_districts: int, steps_taken: int)[source]¶ Lightweight stats object that reports the diversity of a given chain.
- Variables
unique_plans – The number of unique plans seen so far.
unique_districts – The number of unique districts seen so far.
steps_taken – The number of steps taken so far.
Example usage:
DiversityStats(unique_plans=44162, unique_districts=82992, steps_taken=100000)
-
gerrychain.meta.diversity.
collect_diversity_stats
(chain: Iterable[gerrychain.partition.partition.Partition]) → Iterable[Tuple[gerrychain.partition.partition.Partition, gerrychain.meta.diversity.DiversityStats]][source]¶ Report the diversity of the chain being run, live, as a drop-in wrapper. Requires the cut_edges updater on each Partition object. Plans/districts are considered distinct if they are not isomorphic. That is, relabled plans and districts are considered non-unique and counted as duplicate.
Example usage:
for partition, stats in collect_diversity_stats( Replay( graph, "sample-run.chain" ) ): print(stats) # normal chain stuff here
- Parameters
chain (Iterable[Partition]) – A chain object to collect stats on.
- Returns
An iterable of (partition, DiversityStat).
- Return type
Iterable[Tuple[Partition, DiversityStats]]