STAR voting is simpler than IRV
Before you begin: Please try this brief experiment in which you rank and rate 22 of the candidates for Democratic presidential nominee for 2020.
What They Are
Instant Runoff Voting (IRV) is a ranked voting method in which voters can rank the candidates in order, 1–2–3 etc. There are a multitude of ranked voting methods, but IRV has become so pervasive that advocates have taken to calling it by the more generic and marketable moniker “Ranked Choice Voting”, or just RCV. IRV has has experienced a resurgence in recent years, but it was actually invented in the late 1800s and adopted in the Australian House of Representatives in 1918. In the early 1900s, around two dozen U.S. cities used IRV—including Cincinnati and New York—but nearly all of them repealed it.
STAR voting is a rated (scored) rather than ranked voting method, created by voting methods researchers in October 2014. So instead of ranking the candidates 1–2–3 etc., STAR voting lets voters score the candidates on a 0–5 scale. Here’s a STAR voting ballot side-by-side with a 6-choice limited IRV ballot.
With a longer list of candidates, you either have a very wide RCV ballot like this…
Or you restrict voters to only supporting a limited number of candidates, like this….
There are robust arguments that STAR voting is superior to the antiquated IRV method, but here I want to focus purely on simplicity and transparency.
Process: How They Work
IRV only counts your highest ranked candidate at a time, and attempts to find a majority winner among remaining candidates. So it starts by looking at everyone’s first choice. If any candidate has a majority of the votes, that candidates is elected. If not, then the candidate with the fewest votes is eliminated from consideration. We then repeat; so if your favorite candidate was eliminated, your vote now transfers to your second favorite. Sometimes this can go on for many rounds, such as we saw in the IRV mayoral race in Oakland, CA in November 2010.
Whereas the tabulation process in STAR voting is a mere two-step process. The winner is the majority preferred candidate from between the two highest scored overall. Here’s a graphical depiction.
By contrast, here’s an IRV flow chart from the Ranked Choice Voting Resource Center.
Computer Science
The relative simplicity of STAR voting over IRV can be objectively measured via Kolmogorov complexity. Write a computer program to tabulate STAR voting ballots and one to tabulate IRV ballots (preferably with error-checking of the inputted votes). The STAR voting program will be shorter and will run faster, assuming essentially any reasonable programmer does it. This is the standard objective metric used by scientists to assess “simplicity”.
Exit Poll
After a few days of intense online research in the summer of 2006, I came to support a score ballot. But I wanted to gauge how well the average voter would handle it. So while visiting a friend in Beaumont, Texas, I decided to visit the local polling site during the November election and conduct a small exit poll. I actually used a large 0–10 scale instead of the simple 0–5 scale used in STAR Voting. I also included an explicit “no opinion” option, which was treated like a zero, but allowed us to have some extra insights into voter psychology, since we were acting as researchers after all. Here’s the exact ballot I used.
All participants intuitively understood the rating scale, and showed no evidence of confusion. Multiple subsequent experiments showed the same thing. For instance, the Harvey Milk Democratic Club adopted a score ballot for their endorsement elections. A score ballot was also used for the hotly contested 2015 straw poll at the Republican Liberty Caucus. Time and time again, participants had no problems.
Spoiled Ballots
One of the best measures of the simplicity or complexity of a voting method is the rate of spoiled ballots— how often voters mess up when attempting to cast their vote. Analysis by Warren Smith, a Princeton math PhD who’s work was prominently featured in William Poundstone’s 2008 book Gaming the Vote, shows that score ballots actually reduce the rate of spoilage relative to the choose-one status quo. By contrast, ranked ballots increase ballot spoilage rates.
Speed
The 2008 book Gaming the Vote by William Poundstone covers another measure of simplicity where a ratings ballot excels: the speed at which voters cast their votes. He discusses the web site HotOrNot.com.
The main concern you hear is that it is difficult. Score voting is “unnecessarily complicated,” wrote a poster on the Tacoma, Washington, News-Tribune electronic bulletin board, calling it “something only math geniuses at MIT would dream up for a populace in which many can’t balance a checkbook.” This is Steven Brams’s main reservation regarding range voting. Many voters don’t know who the vice president is, Brams told Smith. Maybe there is something ridiculous about asking such people to supply numerical scores. It is worth noting that what sold Hot or Not creators James Hong and Jim Young on [casting score ballots] was its speed. They wanted site visitors to rate photos as fast as possible. That way the site would collect many votes for each picture, giving the scores credibility — “a wisdom of crowds thing,” Hong said. “We found that anything that made it harder to vote was a bad thing.” They considered having visitors pick their favorite of two on-screen photos. A photo would win points for each time it was preferred over another, random photo. This would loosely simulate a Borda count. (In a true Borda count, a candidate wins a point every time a voter ranks her above a rival. No Hot or Not voter could rank all the millions of pictures on the site, of course. The aggregate effect of random visitors ranking random pairs would be similar.) However, when shown two photos that happen to be of roughly equal attractiveness, “people will look at the pictures and not know,” Hong said. “They have a harder time deciding.” Hong and Young also considered a simple “hot” or “not” vote on a single picture. This would be an approval vote. There it was “average” Joes and Janes who slowed things down. People would have to ponder whether to click “hot” or “not.” Score ballots were faster. It seemed to require less thought.
There have been complaints that Hot or Not is silly, superficial, and sexist. No one thinks it’s hard to understand. The ratings on YouTube, Amazon, and all the other sites do not seem to bother people, either. Olympic judging provides another demonstration that scoring is easier than ranking. The numbers that judges hold up on cards (in cartoons, anyway) are range votes. The judges give preliminary scores because scoring can be done on the fly. With rankings, you have to keep adjusting your numbers so that there’s only one athlete or candidate of each rank. This can be devilishly complicated when there are many competitors. I suspect that what’s really behind the perception that scoring is “hard” is number phobia. A check mark is simple. Numbers are difficult. The bigger the numbers involved, the more difficult it seems.
This makes sense when thinking of voting in algorithmic terms. Converting preferences into a ranked list is essentially a bubble sort, whose order of complexity is O(n²). Whereas scoring is a simple two-pass, collect min/max then normalize, O(n).
Expressiveness
One of the great things about ratings over rankings is that they convey intensity of preference. You and I may both prefer X over Y over Z, and thus have the same order of preference. But suppose I like Y almost as much as X, while you dislike Y almost as much as Z. Ratings are simply more flexible than rankings.
A short article by leading survey software provider SurveyMonkey argues that ratings are generally more user friendly, specifically noting their greater expressive ability.
…there is a drawback to using a ranking question. Although ranking different kinds of cake gives you a relative sense of whether one is liked more than the other — you don’t know how much more. Someone may like chocolate cake only slightly more than carrot cake, but thinks cheesecake is disgusting. A ranking question will not let you know the strength of preferences — only the order.
Transparency
Of course, all that expressive ability is only as good as the ability of the voting method to surface it. As you can see in the STAR voting process graphic above, the final STAR voting results can be summarized in two simple rows.
The score totals show Carmen at 624,057 points and Ben at 509,742 points. The runoff has Carmen winning with a 58.9% majority. Compare that to the Oakland mayoral results table we just saw! If we include the non-finalists, we can get a comprehensive picture that looks like this.
On the left, we see all the “candidates” ordered by how many total points they got. Chocolate Chip Cookies and Vanilla Ice Cream are the finalists. On the right, we see that voters preferred Chocolate Chip Cookies to Ice Cream by a 20–15 margin, with 10 voters expressing no preference between those two options. This is as verbose as STAR voting results ever get!
Precinct Summability
Another consequence of the sum-based “additive” nature of STAR voting is that it is precinct summable. This means that when used in electoral precincts, each precinct need only report a summary total of the ballots. This consists of a score total for each candidate, plus a matrix of head-to-head majorities for each pair of candidates. Whereas IRV cannot be counted in precincts, and so the complete ballots (or all the data on them) must be be transported to a central location, creating delays and chain of custody risks. Which is why if you’ve gone to the San Francisco government website to see results their IRV elections, you may have seen this message for some time after the election:
Due to the requirement that all ballots must be centrally tallied in City Hall and not at the polling places, the Department of Elections has not set a date for releasing any preliminary results using the ranked-choice voting method.
Visibility and Viability
Even when IRV elects a majority winner in the first round, the results are’t transparent. This can massively distort the visibility and thus perceived viability for candidates such as third parties, independents, minorities, youth, women, etc. See these recent results from a multi-method online polling experiment which included plurality voting (the status quo), IRV/RCV, STAR (first and second round preferences) voting, and approval voting (like plurality voting except that you can vote for as many candidates as you want to).
Just look at how similar the “RCV First Choice Preferences” are to the plurality voting results above! Warren takes a first-round majority, and we thus never even see the 2nd or 3rd choices and so on. IRV proponents ironically tout this failure to count later preferences as a virtue, invoking the later-no-harm criterion. It should go without saying that this author considers later-no-harm to be a flaw rather than a virtue.
Conclusion
While there are plenty of reasons to believe STAR voting produces better more democratic outcomes, I believe the radically improved simplicity of STAR voting is a major practical advantage over IRV.