Ballot Generators: Generating PreferenceProfiles
¶
We have already seen the use of a PreferenceProfile
generator (the Impartial Culture Model) in the Plotting and Ballot Graph tutorials. Now, let's dive into the rest that are included in votekit
. This tutorial will cover
- Impartial Culture
- Impartial Anonymous Culture
- name-Plackett Luce
- name-Bradley Terry
- name-Cumulative
import votekit.ballot_generator as bg
from votekit.pref_interval import PreferenceInterval
The two simplest to use are the Impartial Culture and Impartial Anonymous Culture. For \(m\) candidates and \(n\) voters, the Impartial Culture model generates PreferenceProfiles
uniformly at random out of the \((m!)^n\) possible profiles. Remember, a PreferenceProfile
is a tuple of length \(n\) that stores a linear ranking \(m\) in each slot.
The Impartial Anonymous Culture model works a little bit differently. When it generates ballots, it chooses a candidate support vector uniformly at random from among all possible support vectors, and then generates ballots according to that vector.
candidates = ["A", "B", "C"]
number_of_ballots = 50
#Impartial Culture
ic = bg.ImpartialCulture(candidates = candidates)
ic_profile = ic.generate_profile(number_of_ballots)
#Impartial Anonymous Culture
iac = bg.ImpartialAnonymousCulture(candidates = candidates)
iac_profile = iac.generate_profile(number_of_ballots)
The 1-D Spatial model assigns each candidate a random point on the real line according to the standard normal distribution. It then does the same for each voter, and then a voter ranks candidates by their distance from the voter.
one_d = bg.OneDimSpatial(candidates = candidates)
one_d_profile = one_d.generate_profile(number_of_ballots)
Ballots Generated Using Intervals¶
The following generative models all depend on preference intervals.
The name-Plackett-Luce, name-Bradley-Terry, and name-Cumulative models all use the interval \([0,1]\). To use these models, we need a bit more information than just the candidates. Suppose for now that there is one type of voter (or bloc \(Q\)) in the state (these models can be generalized to more than one bloc, but we will start simple for now). We record the proportion of voters in this bloc in a dictionary.
Name-PL and Name-BT¶
In the upcoming election, there are three candidates, \(A\), \(B\), and \(C\). In general, the bloc \(Q\) prefers \(A\) 1/2 of the time, \(B\) 1/3 of the time, and \(C\) 1/6 of the time. We can visualize this as the line segment \([0,1]\), with the segment \([0,1/2]\) labeled \(A\), \([1/2, 5/6]\) labeled \(B\), and \([5/6,1]\) labeled \(C\). Note the slight abuse of notation in using the same name for the candidates and their intervals. We store this information in a PreferenceInterval
object.
candidates = ["A", "B", "C"]
number_of_ballots = 50
bloc_voter_prop = {"Q":1}
pref_intervals_by_bloc = {"Q" : {"Q":
PreferenceInterval({"A": 1/2, "B": 1/3, "C": 1/6})
}
}
For each voter, the name-Plackett-Luce (PL) model samples from the list of candidates without replacement according to the distribution defined by the preference intervals. The first candidate it samples is in first place, then second, etc. Visualizing this as the line segment, the PL model uniformly at random selects a point in \([0,1]\). Whichever candidate's interval that point lies in is listed first in the ballot. It then removes that candidate's preference interval from \([0,1]\), rescales so the segment has length 1 again, and then samples a second candidate. Repeat until all candidates have been sampled. We will discuss the cohesion_parameters
argument later.
# Plackett-Luce
pl = bg.name_PlackettLuce(pref_intervals_by_bloc=pref_intervals_by_bloc,
bloc_voter_prop=bloc_voter_prop,
candidates=candidates,
cohesion_parameters={"Q":{"Q":1}})
pl_profile = pl.generate_profile(number_of_ballots)
print(pl_profile)
Ballots Weight
(A, B, C) 21
(B, A, C) 12
(C, A, B) 7
(A, C, B) 4
(C, B, A) 3
(B, C, A) 3
The name-Bradley-Terry (BT) model also fundamentally relies on these preference intervals. The probability that BT samples the ballot \((A>B>C)\) is proportional to the the product of the pairwise probabilities \((A>B), (A>C),\) and \((B>C)\). Using our preference intervals, the probability that \(A>B\) is \(\frac{A}{A+B}\); out of a line segment of length \(A+B\), this is the probability that a uniform random point lies in the \(A\) portion. The other probabilities are computed similarly.
# Bradley-Terry
bt = bg.name_BradleyTerry(pref_intervals_by_bloc=pref_intervals_by_bloc,
bloc_voter_prop=bloc_voter_prop,
candidates=candidates,
cohesion_parameters = {"Q":{"Q":1}})
bt_profile = bt.generate_profile(number_of_ballots)
print(bt_profile)
Ballots Weight
(A, B, C) 22
(B, A, C) 11
(A, C, B) 10
(B, C, A) 5
(C, A, B) 1
(C, B, A) 1
We can do a more complicated example of PL and BT. Consider an election where there are 2 blocs of voters, \(Q\) and \(R\). There are two candidates from the \(Q\) bloc, and two from the \(R\) bloc. The \(R\) block is more insular, and expresses no interest in any of the \(Q\) candidates, while the \(Q\) bloc does have some preference for \(R\)'s candidates. We express this using cohesion_parameters
, which stores the preference of each slate for the other slate's candidates.
candidates = ["Q1", "Q2", "R1", "R2"]
number_of_ballots = 50
bloc_voter_prop = {"Q": 0.7, "R": 0.3}
pref_intervals_by_bloc = {
"Q": {"Q":PreferenceInterval({"Q1": 0.4, "Q2": 0.3}),
"R":PreferenceInterval({"R1": 0.2, "R2": 0.1})},
"R": {"Q":PreferenceInterval({"Q1": 0.3, "Q2": 0.7}),
"R":PreferenceInterval({"R1": 0.4, "R2": 0.6})}
}
cohesion_parameters = {"Q": {"Q": .8, "R":.2},
"R": {"R":1, "Q":0}}
pl = bg.name_PlackettLuce(pref_intervals_by_bloc=pref_intervals_by_bloc,
bloc_voter_prop=bloc_voter_prop,
candidates=candidates,
cohesion_parameters=cohesion_parameters)
pl_profile = pl.generate_profile(number_of_ballots)
print("Number of ballots:", pl_profile.num_ballots())
print(pl_profile)
Number of ballots: 50
PreferenceProfile too long, only showing 15 out of 15 rows.
Ballots Weight
(Q1, Q2, R1, R2) 11
(R2, R1, frozenset({'Q1', 'Q2'}) (Tie)) 11
(R1, Q2, Q1, R2) 6
(Q2, Q1, R1, R2) 5
(R1, R2, frozenset({'Q1', 'Q2'}) (Tie)) 4
(Q1, R1, Q2, R2) 2
(Q2, R1, Q1, R2) 2
(Q1, Q2, R2, R1) 2
(Q1, R1, R2, Q2) 1
(Q2, R2, Q1, R1) 1
(Q2, R2, R1, Q1) 1
(R1, R2, Q2, Q1) 1
(Q2, R1, R2, Q1) 1
(Q1, R2, Q2, R1) 1
(R2, Q2, Q1, R1) 1
Notice that for the first time we have ties on the ballots! The notation {'Q1', 'Q2'} (Tie)
means that these two candidates are tied for third place.
# Bradley-Terry
bt = bg.name_BradleyTerry(pref_intervals_by_bloc=pref_intervals_by_bloc,
bloc_voter_prop=bloc_voter_prop,
candidates=candidates,
cohesion_parameters=cohesion_parameters)
bt_profile = bt.generate_profile(number_of_ballots)
print("Number of ballots:", bt_profile.num_ballots())
print(bt_profile)
Number of ballots: 50
Ballots Weight
(Q1, Q2, R1, R2) 9
(R1, R2, frozenset({'Q1', 'Q2'}) (Tie)) 9
(Q1, R1, Q2, R2) 7
(Q2, Q1, R1, R2) 6
(R2, R1, frozenset({'Q1', 'Q2'}) (Tie)) 6
(Q2, Q1, R2, R1) 3
(Q1, Q2, R2, R1) 3
(R1, Q1, Q2, R2) 3
(Q2, R1, Q1, R2) 3
(R1, Q2, Q1, R2) 1
Name-Cumulative¶
Cumulative voting is a method in which voters are allowed to put candidates on the ballot with multiplicity.
candidates = ["Q1", "Q2", "R1", "R2"]
number_of_ballots = 50
bloc_voter_prop = {"Q": 0.7, "R": 0.3}
pref_intervals_by_bloc = {
"Q": {"Q":PreferenceInterval({"Q1": 0.4, "Q2": 0.3}),
"R":PreferenceInterval({"R1": 0.2, "R2": 0.1})},
"R": {"Q":PreferenceInterval({"Q1": 0.3, "Q2": 0.7}),
"R":PreferenceInterval({"R1": 0.4, "R2": 0.6})}
}
cohesion_parameters = {"Q": {"Q": .8, "R":.2},
"R": {"R":1, "Q":0}}
num_votes_per_ballot = 3
We will also take this chance to introduce the by_bloc
parameter to the generate_profile
method, which when set to True
returns a tuple. The first entry is a dictionary, which records the ballots cast by each bloc. The second entry is the full profile, i.e. what you would get if you just ran generate_profile
with by_bloc=False
.
c = bg.name_Cumulative(pref_intervals_by_bloc=pref_intervals_by_bloc,
bloc_voter_prop=bloc_voter_prop,
candidates=candidates,
cohesion_parameters=cohesion_parameters,
num_votes=num_votes_per_ballot)
c_profile_dict, agg_profile = c.generate_profile(number_of_ballots=100, by_bloc=True)
c_profile_dict["Q"]
PreferenceProfile too long, only showing 15 out of 31 rows.
Ballots Weight
(Q1, Q1, Q1) 6
(Q1, Q1, R1) 6
(Q1, Q2, Q1) 6
(Q2, Q2, Q1) 4
(Q1, R1, Q1) 4
(Q1, Q1, Q2) 4
(Q1, Q2, R1) 3
(Q2, Q2, Q2) 3
(Q2, Q1, Q1) 3
(R2, Q2, Q2) 3
(R1, Q1, Q2) 2
(Q1, Q2, Q2) 2
(Q2, Q2, R1) 2
(Q2, Q1, R1) 2
(R2, Q2, Q1) 2
c_profile_dict["R"]
Ballots Weight
(R2, R2, R2) 12
(R1, R1, R2) 4
(R1, R2, R1) 3
(R1, R1, R1) 3
(R2, R1, R2) 3
(R2, R1, R1) 2
(R2, R2, R1) 2
(R1, R2, R2) 1
agg_profile
PreferenceProfile too long, only showing 15 out of 39 rows.
Ballots Weight
(R2, R2, R2) 12
(Q1, Q1, R1) 6
(Q1, Q2, Q1) 6
(Q1, Q1, Q1) 6
(Q1, R1, Q1) 4
(Q2, Q2, Q1) 4
(Q1, Q1, Q2) 4
(R1, R1, R2) 4
(R2, Q2, Q2) 3
(Q2, Q2, Q2) 3
(R1, R2, R1) 3
(Q2, Q1, Q1) 3
(R1, R1, R1) 3
(Q1, Q2, R1) 3
(R2, R1, R2) 3
Observe the multiplicity of candidates, as well as the fact that no voter in the R
bloc cast a vote for Q
candidates. To make the Ballot
object as flexible as possible over different methods of election, we have implemented cumulative voting ballots as follows. The ranking on the ballot holds no meaning; all that matters is the multiplicity. That is, the ballot (R1, R1, R2) is the same as (R2, R1, R1). The PreferenceProfile
object does not know that and thus displays them as different ballots, but our cumulative election class will handle tallying results for you.
We will discuss the slate models, as well as AC and CS in a later tutorial.