Scores
Generally, the point of running the chain is comparing the properties of some existing plan to the properties of the ensemble of plans generated by the chain. Thus, we need a way to record information about the plans generated during the progression of the chain. This is the point of Score
s: each score has a scoring function that is evaluated on the current districting plan at each step of the chain. The resulting values are saved in a dictionary, which is itself added to a growing ChainScoreData
object at each step of the chain. The ChainScoreData
returned by recom_chain()
and flip_chain()
is like a "history" of the scores at each step of the chain. (See get_scores_at_step() for information about how to retrieve the values of any subset of scores at a particular step in the chain.)
Score types
All Scores have the type AbstractScore
. There are four categories of scores (note that DistrictAggregate
is really a sub-category of DistrictScore
):
Score Type | Fields | Description |
---|---|---|
DistrictAggregate | name (String), key (String) | A DistrictAggregate score is a simple sum of a particular property over all nodes in a given district. |
DistrictScore | name (String), score_fn (Function) | A DistrictScore takes a user-supplied function that returns some quantity of interest given the nodes in a given district. The signature of score_fn should be as follows: score_fn(graph::BaseGraph, district_nodes::BitSet, district::int) |
PlanScore | name (String), score_fn (Function) | A PlanScore takes a user-supplied function that returns some quantity of interest given a BaseGraph and corresponding Partition object. The signature of score_fn should be as follows: score_fn(graph::BaseGraph, partition::Partition) |
CompositeScore | name (String), scores (Array{S, 1} where S<:AbstractScore) | A CompositeScore is just a group of scores that are run in sequence. CompositeScores are especially useful when the score functions depend upon/modify some shared state. The Election object is implemented as a CompositeScore . |
So, when should you use which score? Here's a general breakdown:
DistrictAggregate
Use this score when you just want to sum an attribute over all nodes in a district.
GerryChain.DistrictAggregate
— TypeDistrictAggregate(name::String,
key::String)
A DistrictAggregate score is a simple sum of a particular property over all nodes in a given district.
For example, if you wanted to count the number of Black people in each district at every step of the chain, you could use a DistrictAggregate
score to do so. If the attribute in the nodes of the graph was called "BLACK" and you wanted to call your score "Black_Pop", you would initialize the score as
julia> DistrictAggregate("Black_Pop", "BLACK")
DistrictAggregate("Black_Pop", "BLACK")
If you wanted to name your score the same as the node attribute, you could also use
GerryChain.DistrictAggregate
— MethodDistrictAggregate(key::String)
Initializes a DistrictAggregate score where the name and key are the same.
In this case, you would instantiate your score as
julia> DistrictAggregate("BLACK")
DistrictAggregate("BLACK", "BLACK")
DistrictScore
This score works best when you're interested in tracking a statistic for each district for all plans in the chain. For example, you might want to know the Polsby-Popper score for each district at every step of the chain.
GerryChain.DistrictScore
— TypeDistrictScore(name::Union{String, Missing}
score_fn::Function)
A DistrictScore
takes a user-supplied function that returns some quantity of interest given the nodes in a given district. The signature of score_fn
should be as follows: score_fn(graph::BaseGraph, district_nodes::BitSet, district::int)
The example below illustrates a simple example of setting up a DistrictScore
. Perhaps somehow you find yourself interested in computing the number of edges within a district.
julia> function num_edges_within_dist(graph::BaseGraph,
district_nodes::BitSet,
district::Int)
""" Computes the number of edges within each district of the plan.
Arguments:
graph: The underlying BaseGraph
district_nodes: A Set of nodes in district `district`. Each node
is represented as an Int.
district: Int representing the district
"""
all_edges = []
for node in district_nodes, neighbor in graph.neighbors[node]
if neighbor in district_nodes
edge = graph.adj_matrix[node, neighbor]
push!(all_edges, edge)
end
end
return length(unique(all_edges))
end
num_edges_within_dist (generic function with 1 method)
julia> # instantiate this score
DistrictScore("edges_within_dist", num_edges_within_dist)
DistrictScore("edges_within_dist", Main.ex-env.num_edges_within_dist)
PlanScore
This type of score is suited to statistics that are evaluated on an entire plan. For example, the number of districts won by a party for a given plan would be a PlanScore
.
GerryChain.PlanScore
— TypePlanScore(name::Union{String, Missing},
score_fn::Function)
A PlanScore
takes a user-supplied function that returns some quantity of interest given a BaseGraph
and corresponding Partition
object.
The signature of score_fn
should be as follows: score_fn(graph::BaseGraph, partition::Partition)
Below we illustrate a few examples on how to construct a PlanScore
:
julia> function num_max_blocks(graph::BaseGraph,
partition::Partition)
""" Returns the maximum number of blocks in each district.
This function assumes that the nodes in `graph` are blocks.
"""
return maximum([length(nodes) for nodes in partition.dist_nodes])
end
num_max_blocks (generic function with 1 method)
julia> function higher_num_cut_edges_than_parent(graph::BaseGraph,
partition::Partition)
""" Returns True if `partition` has a higher number of cut edges than the
previous plan, False otherwise.
"""
if partition.parent isa Partition & partition.num_cut_edges > partition.parent.num_cut_edges
return True
end
return False
end
higher_num_cut_edges_than_parent (generic function with 1 method)
julia> # how to formulate these scores
PlanScore("num_max_blocks", num_max_blocks)
PlanScore("num_max_blocks", Main.ex-env.num_max_blocks)
julia> PlanScore("higher_num_cut_edges_than_parent", higher_num_cut_edges_than_parent)
PlanScore("higher_num_cut_edges_than_parent", Main.ex-env.higher_num_cut_edges_than_parent)
CompositeScore
This might be the most difficult type of score to wrap one's mind around. CompositeScore
s are best when you have a series of scores with some shared state or rely on the same computation. They allow you to "group" scores together. For example, we use CompositeScore
s for Election
s, since almost all election-related scores rely on vote counts and vote shares.
GerryChain.CompositeScore
— TypeCompositeScore(name::String,
scores::Array{S,1}) where {S<:AbstractScore} # should be other AbstractScores
A CompositeScore
is just a group of scores that are run in sequence. CompositeScore
s are especially useful when the score functions depend upon/modify some shared state.
Built-In Score Functions
Currently only one score function comes built-in with GerryChain
, and it is the number of cut-edges in the plan.
GerryChain.num_cut_edges
— Functionnum_cut_edges(name::String)::PlanScore
Returns a PlanScore
that tracks the number of cut edges in a particular plan.
A little more detail on score functions
Why do we differentiate between the different types of scores? The answer boils down to efficiency. Recall that each plan in the chain (whether it is uses ReCom proposals or Flip proposals) only differs from the previous plan by 2 districts. This means we can save space / eliminate redundant computation by only re-calculating district-level scores on the districts that were changed. However, for plan-level scores, we always have to re-run the score function on the entire partition.
Other Score Functions
GerryChain.coerce_aggregated_attributes!
— Methodcoerce_aggregated_attributes!(graph::BaseGraph,
scores::Array{S, 1}) where {S<:AbstractScore}
Coerces DistrictAggregate attributes in scores
to be Floats if they are Strings.
GerryChain.eval_score_on_district
— Methodeval_score_on_district(graph::BaseGraph,
partition::Partition,
score::DistrictAggregate,
district::Int)::Number
Evaluates a DistrictAggregate
score on the nodes in a particular district
.
GerryChain.eval_score_on_district
— Methodeval_score_on_district(graph::BaseGraph,
partition::Partition,
score::DistrictScore,
district::Int)
Evaluates a user-supplied DistrictScore
function on the nodes in a particular district
.
GerryChain.eval_score_on_partition
— Methodeval_score_on_partition(graph::BaseGraph,
partition::Partition,
composite::CompositeScore)
Evaluates the user-supplied functions in the CompositeScore
scores array on each of the districts in the partition.
Returns an Dict of the form:
{
# District-level scores
d\_score₁.name : [a₁, a₂, ..., aᵢ]
...
d\_scoreᵤ.name : [b₁, b₂, ..., bᵢ]
# Partition-level scores
p\_score₁.name : c,
...
p\_scoreᵥ.name : d,
}
GerryChain.eval_score_on_partition
— Methodeval_score_on_partition(graph::BaseGraph,
partition::Partition,
score::PlanScore)
Evaluates a user-supplied PlanScore
function on the entire partition.
GerryChain.eval_score_on_partition
— MethodEvaluates a user-supplied DistrictScore
function or DistrictAggregate
score on all districts in an entire plan.
Returns an array of the form [a₁, a₂, ..., aᵢ], where i
is the number of districts in the plan.
GerryChain.score_initial_partition
— Methodscore_initial_partition(graph::BaseGraph,
partition::Partition,
scores::Array{S, 1}) where {S<:AbstractScore}
Returns a dictionary of scores for the initial partition. The dictionary has the following form (where n
is the number of districts, u
is the number of district-level scores, v
is the number of partition-level scores, and i
is the number of composite scores):
{
# District-level scores
d\_score₁.name : [a₁, a₂, ..., aᵤ]
...
d\_scoreᵤ.name : [b₁, b₂, ..., bᵤ]
# Partition-level scores
p\_score₁.name : c,
...
p\_scoreᵥ.name : d,
# Composite scores
c\_score₁.name :
{
c\_score₁.scores[1].name : ...
...
}
...
c\_scoreᵢ.name :
{
...
}
}
GerryChain.score_partition_from_proposal
— Methodscore_partition_from_proposal(graph::BaseGraph,
partition::Partition,
proposal::AbstractProposal,
scores::Array{S, 1}) where {S<:AbstractScore}
Returns a Dictionary of
- (a) updated district-level scores for districts that were altered
after proposal
was accepted,
- (b) partition-level scores, and
(c) composite scores, that may be comprised of scores from (a) or (b).
For example, suppose district 4's new White population is 43 and the new Sen2010_Dem population is 62, district 8's new White population is 22 and new Sen2010_Dem population is 66. The Δ scores would look like:
{
"num_cut_edges" : partition.num\_cut\_edges,
"dists" : (5, 8),
"White" : [43, 22],
"Sen2010_Dem" : [62, 66],
}