Data visualization

After running a chain, it's common to want to create some data visualizations that summarize various statistics of interest across plans in the chain (e.g., a compactness score, party vote share by district, etc.). As part of the GerryChainJulia library, we use PyPlot.jl to create some functions that will generate some common types of graphs.

score_boxplot()

Boxplots can be helpful for visualizing the "typical" range of values for a statistic of interest for all the plans in the chain. For example, users interested to know whether "cracking and packing" has occurred might compare the percentage of minority voters in each district in an enacted plan overlaid on a set of boxplots representing the typical range of percentage of minority voters for plans generated by the chain. Users can generate boxplots for any score of interest. If the score is a district-wide score (e.g., percentage of white voters in each district), then there will be multiple boxplots shown in the same figure, with each boxplot representing the range of scores for a particular district. If the score is a plan-wide score, then there will be one boxplot in the figure.

GerryChain.score_boxplotFunction
score_boxplot(score_values::Array{S, 2};
              sort_by_score::Bool=true,
              label::String="GerryChain",
              comparison_scores::Array=[],
              ax::Union{Nothing, PyPlot.PyObject}=nothing) where {S<:Number}

Produces a graph with multiple matplotlib box plots for the values of scores throughout the chain. Intended for use with district-level scores (DistrictAggregate, DistrictScore).

Arguments:

  • score_values : A 2-dimensional array of score values with dimension (n x d), where n is the number of states in the chain and d is the number of districts
  • sortbyscore : Whether we should order districts by median of score value.
  • label : Legend key for the GerryChain boxplots. Only shown if there are scores from other plans passed in as reference points.
  • comparison_scores : A list of Tuples that is passed in if the user would like to compare the per-district scores of a particular plan with the GerryChain results on the same graph. The list of tuples should have the structure [(l₁, scores₁), ... , (lᵤ, scoresᵤ)], where lᵢ is a label that will appear on the legend and scoresᵢ is an array of length d, where d is the number of districts. Each element of the tuple should be of type Tuple{String, Array{S, 1}}. Example: [ (name₁, [v₁, v₂, ... , vᵤ]), ... (nameₓ, [w₁, w₂, ... , wᵤ]) ], where there are x comparison plans and u districts.
  • ax : A PyPlot (matplotlib) Axis object

Returns a MatPlotLib Axis object with the boxplot.

source
score_boxplot(score_values::Array{S, 1};
              label::String="GerryChain",
              comparison_scores::Array=[],
              ax::Union{Nothing, PyPlot.PyObject}=nothing) where {S<:Number}

Produces a single matplotlib box plot for the values of scores throughout the chain. Intended for use with plan-level scores.

Arguments:

  • score_values : A 1-dimensional array of score values of length n, where n is the number of states in the chain.
  • label : Legend key for the GerryChain boxplots. Only shown if there are scores from other plans passed in as reference points.
  • comparison_scores : A list of Tuples that is passed in if the user would like to compare the score of a particular plan with the GerryChain boxplot on the same graph. The list of tuples should have the structure [(l₁, score₁), ... , (lᵤ, scoreᵤ)], where lᵢ is a label that will appear on the legend and scoreᵢ is the value of the plan-wide score for the comparison plan.
  • ax : A PyPlot (matplotlib) Axis object

Returns a MatPlotLib Axis object with the boxplot.

source
score_boxplot(chain_data::ChainScoreData,
              score_name::String; kwargs...)

Creates a graph with boxplot(s) of the values of scores throughout the chain.

Arguments:

  • chain_data : ChainScoreData object that contains the values of scores at every step of the chain
  • score_name : name of the score (i.e., the name field of an AbstractScore)
  • kwargs : Optional arguments, including label, comparison_scores, and sort_by_score (the latter should only be passed for district-level scores).

Returns a MatPlotLib Axis object with the boxplot.

source

Usage

# run chain
chain_data = recom_chain(...)

# graph results and compare to enacted plan!
# the length of `plan1_dshare` and `plan2_dshare` should be equal to the total # of districts
plan1_dshare = [...]
plan2_dshare = [...]
score_boxplot(chain_data, "dem_vote_share", comparison_scores=[("plan1", plan1_dshare), ("plan2", plan2_dshare)])
# without comparison scores:
# score_boxplot(chain_data, "dem_vote_share")

# if you want to edit anything about the default plot, you can simply use plt
plt.ylabel("Democratic vote share")

Example of generated plot

boxplot

score_histogram()

This function allows users to easily create histogram graphs of a (plan-level) score of interest (e.g., the number of cut edges, the number of seats won by a particular party, etc.) (Unlike the score_boxplot function, this function cannot be used on district-level scores.) Similar to score_boxplot, you can also pass in "comparison scores" to visualize where a particular value of a score lies in relation to the histogram of values observed during the process of the chain.

GerryChain.score_histogramFunction
score_histogram(score_values::Array{S, 1};
                comparison_scores::Array=[],
                bins::Union{Nothing, Int, Vector}=nothing,
                range::Union{Nothing, Tuple}=nothing,
                density::Bool=false,
                rwidth::Union{Nothing, T}=nothing,
                ax::Union{Nothing, PyPlot.PyObject}=nothing) where {S<:Number, T<:Number}

Creates a graph with histogram of the values of a score throughout the chain. Only applicable for scores of type PlanScore.

Arguments:

  • score_values : A 1-dimensional array of score values of length n, where n is the number of states in the chain.
  • comparison_scores : A list of Tuples that is passed in if the user would like to compare core of a particular plan with the GerryChain histogram on the same figure. The list of tuples should have the structure [(l₁, score₁), ... , (lᵤ, scoreᵤ)], where lᵢ is a label that will appear on the legend and scoreᵢ is the value of the plan-wide score for the comparison plan.
  • ax : A PyPlot (matplotlib) Axis object

Returns a MatPlotLib Axis object with the histogram.

source
score_histogram(chain_data::ChainScoreData,
                score_name::String; kwargs...)

Creates a graph with histogram of the values of a score throughout the chain. Only applicable for scores of type PlanScore.

Arguments:

  • chain_data : ChainScoreData object that contains the values of scores at every step of the chain
  • score_name : name of the score (i.e., the name field of an AbstractScore)
  • kwargs : Optional arguments, including comparison_scores and other matplotlib arguments.

Returns a MatPlotLib Axis object with the histogram.

source

Usage

# run chain
chain_data = recom_chain(...)

# graph results and compare to enacted plan!
score_histogram(chain_data, "cut_edges", comparison_scores=[ ("enacted", 21) ]) # if the enacted plan has 21 cut edges
# score_histogram(chain_data, "cut_edges", comparison_scores=[ ("enacted", 21) ], bins=3, rwidth=1) # we also support passing in a few arguments that can be passed into matplotlib
# score_histogram(chain_data, "cut_edges") # without any comparison scores

# if user wants to edit anything about the default plot, they can simply use plt
plt.xlabel("efficiency_gap")

Example of generated histogram

boxplot