Every Dollar Counts, Every Vote Counts

I. The Vote

About one-fifteenth of how Congress votes tracks with who paid for their campaign. That's the number. I built a model, fed it 36 million campaign contributions and 2.4 million congressional votes, and asked it to decompose every vote into its explanatory parts: ideology, party loyalty, bill content, district demographics, committee assignments, and money.¹ The money part came back at 6.64%. Which is, before you do anything with that number, genuinely complicated, because the other ninety-three percent is exactly what you'd want it to be. A Republican from rural Alabama votes like a Republican from rural Alabama. A Democrat from coastal California votes like a Democrat from coastal California. Party loyalty and ideology explain the bulk of it. Congress is not a vending machine. The model, for the most part, is bored.

But every so often a vote comes back interesting.

Representative Zachary Nunn, a Republican from Iowa, is a combat aviator turned congressman who won his seat in 2022 by seven-tenths of a percentage point and sits on the House Financial Services Committee, where he serves as vice chairman of the Subcommittee on National Security, Illicit Finance, and International Financial Institutions. He is, in other words, precisely the kind of member whose committee assignment places him at the intersection of legislative authority and industry interest that campaign finance researchers have been trying to measure for decades. In the 119th Congress, Nunn defected from his party on multiple votes related to the Financial Services and General Government appropriations bill (the bill that funds, among other things, the agencies his committee oversees). The model flagged these votes with an 11.6% money attribution, nearly double the average. 43.6% of the prediction came from bill content. 33.3% from party loyalty. 11.5% from ideology. The model thinks a combination of the bill's substance and Nunn's financial profile explains the defection better than the factors that normally dominate congressional voting.

Now. I want to be careful here, because (and I'm going to say this several times in this piece, enough times that you'll get tired of hearing it, which is fine because the alternative is that I say it once and you forget and then send me an email) correlation is not causation. The model cannot tell you that Nunn defected because of his donor profile any more than it can tell you that his donors gave him money because they knew he was already inclined to break from his party on appropriations. Both explanations produce identical data signatures. The model sees the correlation, flags it, quantifies it to two decimal places, and then moves on with the supreme indifference of an algorithm that does not understand what money is or why people want it. That is a limitation of gradient-boosted trees and also, as it turns out, of the entire field of campaign finance research going back roughly to the invention of campaigns.²

Nunn is one vote. The model has 2.4 million of them. Here are a few where the financial signal was loudest:

Highest Financial Signal on Party Defections

Votes where a member broke from their party and the model's money attribution was in the top decile. Dashed line shows the 6.64% average across all votes.

Zachary Nunn(R-IA)

Financial Services Approps

11.6%

Glenn Thompson(R-PA)

Consolidated Approps 2026

11%

Randy Fine(R-FL)

Consolidated Approps 2026

10.7%

6.64% avg (all votes)Money attribution (% of prediction)

II. The Number

Here is the headline finding: across all votes cast by all 539 current members of Congress, the model's mean money attribution is 6.64%. Financial features (donor industry concentration, PAC ratios, in-state vs. out-of-state money, dark money exposure, contribution timing, lobbying velocity, and about a hundred other variables that I will spare you the enumeration of) account for roughly a fifteenth of the model's prediction of how a member of Congress will vote on any given bill.

I want to sit with that for a second, because I think the instinct (my instinct, anyway, and maybe yours) is to round it down to "that's basically nothing." That instinct is wrong, or at least wrong-ish, in ways I think matter.

On the face of it: more than 93% of the variance is explained by things that have nothing to do with money. Ideology. Party loyalty. What the bill actually says. Whether the member is on a relevant committee. What their district looks like. Congress is not a vending machine where you insert a campaign contribution and a vote comes out. Most votes, the vast majority of votes, go the way they would go if campaign finance did not exist at all. The model is quite clear about this.

But 6.64% is not zero. And it comes from a model that predicts congressional votes correctly 97.4% of the time and that identifies party-line defections with 93.9% precision (meaning: when this model points at a vote and says that one was financially interesting, it is correct ninety-four times out of a hundred, which is a different kind of authority than a blurry signal spread across everything). The model has 107 financial features to work with, and all 107 of them collectively still explain a measurable share of the prediction after you've already accounted for ideology, party, bill content, district, and institutional position. No single financial feature dominates. The signal is quiet and persistent.

Here is something I want to spend a moment on, because I think it's the most structurally revealing thing in the data: when I looked at the SHAP decomposition of individual votes (the technique the model uses to attribute its predictions to specific features), the top thirteen most influential features for any given vote are almost never financial. They're ideological scores, party loyalty metrics, bill similarity measures, procedural flags. The money features don't appear until you get deep into the long tail of smaller contributions to the prediction. But there are 107 of them in that long tail, and they all push in roughly the same direction, and collectively they add up to 6.64%. Money, it turns out, works like a gravitational field. No single financial variable moves a vote the way party loyalty or ideology does, but the cumulative weight of a hundred-odd financial features bends the trajectory of enough predictions, on enough votes, to show up in the aggregate. Which is, if you think about it, exactly what you'd expect from a form of influence that has survived two millennia of reform efforts: a diffuse, structural pressure that operates below the threshold of any individual variable's significance, too distributed to isolate and too persistent to ignore.³ The 6.64% clusters unevenly. It clusters around certain types of members, certain types of votes, and certain moments in a legislator's career. The clusters are where the stories are.

III. The Tenure Curve

This is, I think, the most interesting thing in the data, and I'll confess that I didn't expect to find it.

When you break money attribution down by how long a member has served, a pattern emerges that I initially expected to look like a hill (freshmen low, peak mid-career, decline for seniors). Instead, it looks like a cliff with a long, gradual slope back up.⁴ Freshmen show the highest financial signal of any tenure group: 8.25% in the House, 7.74% in the Senate. The signal drops sharply after the first term, bottoms out around the third, and then slowly climbs back up without ever reaching the freshman peak again.

The Tenure Curve

Average money attribution by term number. Both chambers peak with freshmen, then decline.

House

Senate

HouseSenate6.64% overall avg

Which raises a question the model can't answer but that I think the shape of the curve strongly implies: what if the money signal is mostly about legibility?

Think about it from the model's perspective (which is, I realize, a slightly unsettling invitation, like being asked to see things through the eyes of a surveillance camera that has read every FEC filing since 2017 but has never once been outside, but bear with me). A first-term member has no voting record. The model has no ideological track record to work with, no established pattern of party loyalty, no history of defections or conformity. The features that normally dominate (DW-NOMINATE scores, historical loyalty rates, defection streaks) are either absent or thin. In this vacuum, financial features carry more weight, less because money matters more for freshmen than because the model has less non-financial information to work with. As a member builds a record, ideology and party features absorb more of the prediction, and the financial signal's share shrinks. The money may still be there. The other features just got louder.

Or to put it differently: the tenure curve may be measuring how much the model needs financial features to predict a legislator's votes over time, rather than how much money actually influences them. For freshmen, the money is one of the few signals available. By the third term, the model has plenty of other information and the money adds less. By the ninth term, the member's voting record is so long and so ideologically coherent that the model barely needs to look at their donors at all.

But there is another reading, and I don't want to dismiss it, because it is (I think) more interesting and also more unsettling: money may genuinely matter most when a member first arrives. Donor selection (the process by which campaign money flows toward candidates whose views align with donor interests before a single vote is cast) would produce exactly this pattern. The money selected for these candidates. The model detects that alignment most clearly before ideology and party features absorb it over successive terms. As the member's voting record grows, the alignment becomes baked into ideology scores, and the financial signal becomes redundant. The money kept mattering; it just became invisible to the decomposition. Which is, if you're inclined to worry about this sort of thing, a much more troubling finding than the hill I expected. A hill implies that members are gradually captured and then gradually freed. The actual shape implies that the alignment happened before the first vote was cast, and then simply disappeared into the model's other categories, like a river that doesn't stop flowing just because it's been reclassified as groundwater.⁵

The curve also has two different slopes depending on which side of the aisle you're standing on.

The Party Gap

Money attribution by term and party. Democrats show consistently higher financial signal at every tenure level.

DemocratsRepublicans6.64% overall avg

Both parties start high as freshmen and drop sharply after the first term. But Democrats show consistently higher financial signal at every tenure level: 8.70% vs. 7.75% for freshmen, 6.40% vs. 5.53% for third-termers, 7.22% vs. 6.03% for the longest-serving members. The gap is modest (less than a percentage point on average) and persistent regardless of seniority, and I don't have a clean explanation for why. The donor ecosystems of the two parties are structurally different (Democratic candidates have shifted toward small-dollar fundraising and a more fragmented donor base; Republican candidates rely more heavily on corporate PACs and industry money), and the model may be picking up that structural difference rather than a behavioral one: a more fragmented donor ecosystem provides more individual financial features for the model to detect signal in, the way a photograph with more pixels contains more information even if it depicts the same scene. Or there may be a genuine behavioral difference. The gap is small enough that I'm cautious about building a structural explanation on top of it, and I'd encourage you to be cautious too.⁶

And then there are the freshmen at the extremes. The highest-yield first-termers in the 119th Congress (Hernández (D-PR) at 8.42%, Subramanyam (D-VA) at 7.73%) arrive with financial signal above the overall average from day one. The lowest-yield freshmen (Fetterman (D-PA) at 4.74%, Greene (R-GA) at 4.96%) show financial signal well below average. The range is narrower than you might expect: the gap between the most and least financially legible freshmen is less than four percentage points. Which is itself a finding, I think, because it suggests that the model sees money as a diffuse, structural feature of the political economy rather than something concentrated in a few captured individuals. Nobody is dramatically financially captured. Everybody is a little bit financially legible. The distribution is narrow and the variation is modest, which is consistent with a system where influence operates structurally rather than transactionally.

IV. Where Money Matters Most

The 6.64% average conceals some variation depending on what kind of vote you're looking at, but less than I expected, and the pattern of variation (or, more precisely, the pattern of its absence) tells its own story.

Amendment votes show the highest financial signal at 7.49%, followed by nominations at 6.43%, passage votes at 5.75%, and procedural votes at 5.28%. Amendments being highest makes a certain intuitive sense: they're the votes with the highest defection rate (6.78%), the most substantive policy stakes per individual vote, and the least party-line pressure. They're also the votes that attract the least media attention, the least public scrutiny, and the most room for individual members to maneuver without anyone particularly noticing. Which makes them, if you think about it, exactly the legislative terrain where a diffuse financial signal would have room to express itself: the small, specific, substantive votes that shape what a bill actually does, far from the theatrical floor votes where C-SPAN is rolling and party leadership is whipping furiously. The ones nobody's watching.⁷

The more interesting finding, though, is about what doesn't vary. You might assume that the closest votes (margins under 10%) would show the highest money attribution, on the theory that money matters most when the outcome is in doubt. The data says otherwise. Money attribution is essentially flat across all levels of contestedness: 5.89% for tight votes, 5.90% for moderate-margin votes, 5.86% for comfortable margins. There is no meaningful variation. Whatever the financial signal is measuring, it operates uniformly across the political landscape rather than concentrating on close calls. This, too, is consistent with the gravitational-field model: gravity is always there, pulling at the same rate, regardless of whether two objects are close to the same mass, regardless of whether the overall balance of forces makes its effects visible in any individual interaction.

And then there's the committee effect, or rather, the absence of one. Members of the Finance Committee show 5.76% money attribution; members not on that committee show 5.95%. Health committee members: 5.80% vs. 5.91%. Energy: 5.78% vs. 5.93%. In every case, the gap is slightly negative (committee members show marginally less financial signal than non-members). These gaps are tiny (a fraction of a percentage point) and probably not meaningful in themselves, but they're consistently in the direction you would not predict, which I find interesting enough to mention and then not quite interesting enough to build a theory around. One possibility: committee members' votes on industry-relevant bills are so thoroughly predicted by their committee assignment and ideology that financial features become redundant. The model doesn't need to look at a Finance Committee member's bank donors to predict how they'll vote on banking regulation, because the committee seat already told it everything it needed to know. The money is there, but the committee assignment is doing the same explanatory work, so the financial features don't add anything the model can't already see.

V. The Oldest Trick

Before I go any further into the data, I want to back up and acknowledge something that I think the data, on its own, cannot communicate, which is that none of this is new. The specific numbers are new. The model is new. The ability to decompose individual votes into attributive factors is (I think) genuinely novel. But the pattern (private money flowing toward political power, political power flowing toward favorable treatment of the money, and everybody involved maintaining a vocabulary designed to describe this as something other than what it obviously is) is very, very old. Old enough, in fact, to be consistent with a model finding it baked into the data at a low but persistent level rather than spiking on individual votes: structural features of political economies manifest as exactly this: a 6.64% baseline that never goes away.

The Roman Republic had a word for electoral bribery: ambitus. They also had a word for generosity toward voters: benignitas. Cicero himself drew the distinction. In practice, nobody could tell them apart, which was more or less the point. Roman elections ran on free food, entertainment, and hard cash; candidates walked the city with a nomenclator (essentially a human Rolodex) whispering voter names so they could greet everyone personally; and the Senate kept passing increasingly severe anti-bribery laws that somehow never worked. The Lex Baebia of 181 BC. The Lex Acilia Calpurnia of 67 BC. Cicero's own Lex Tullia of 63 BC. Each one raised the penalties. Each one was circumvented within a year. Eventually the anti-corruption laws themselves became weapons in the power struggles they were designed to prevent: Pompey shortened bribery trials to three hours and then used the expedited process to prosecute his political enemies. When his own father-in-law was accused under the same law, Plutarch relates that Pompey put on mourning clothes and summoned the jurors to his house. The charges were dropped.⁸

The American version of this pattern starts in 1896, when a Cleveland industrialist named Mark Hanna systematically assessed banks and corporations a percentage of their assets to fund William McKinley's presidential campaign against William Jennings Bryan. He didn't ask for donations. He assessed them. Like taxes, except the taxes went to electing a president who would be friendly to the people paying them. Hanna raised the equivalent of roughly $150 to$ 200 million in today's dollars. When asked about money and politics, he reportedly said there were two things that mattered: "The first is money, and I can't remember what the second one is."⁹ Before Hanna, corporate involvement in elections was ad hoc. After Hanna, it was systematic, professionalized, and scaled. Every subsequent era of campaign finance is essentially a variation on the infrastructure he built.

The mechanics have changed since 1896. The Supreme Court has opened and closed various loopholes (Buckley¹⁰, McCain-Feingold¹¹, Citizens United, SpeechNow). But the total outside spending in federal elections tells the structural story more clearly than any case law:

The Ratchet

Outside spending in federal elections, 1990–2020.

ReformCourt decisionThe Ratchet

Source: OpenSecrets

What the chart shows, and what I think the Romans would recognize instantly, is the ratchet. Each scandal produces a reform. Each reform creates a workaround. Each workaround becomes the new normal. The baseline only goes in one direction.

VI. What I Don't Know

I want to be upfront about what this data can and can't tell you, because I think there's a version of this post (a version I was tempted to write, if I'm being transparent about my own rhetorical impulses) that presents the 6.64% number as either a vindication of the system or an indictment of it, cherry-picks the most dramatic individual defections, and lets the reader walk away with their priors confirmed. That version would get more engagement. It would also be misleading.

Here is what the model actually tells you: financial features are statistically associated with vote predictions at a rate that a gradient-boosted tree finds meaningful, after controlling for everything else I could think of to control for. It says this association is strongest for freshmen and declines with tenure. It says the House shows slightly higher financial signal than the Senate (6.13% vs. 5.60%). It says amendments show the highest signal and procedural votes the lowest. It says that Democrats show higher financial signal than Republicans (6.34% vs. 5.46%), a gap that persists at every tenure level and that I do not have a clean explanation for.

There is also, I should note, a hole in the methodology that I haven't figured out how to close. The model measures how members vote. It does not measure whether members show up for the vote (a topic I write about in Who Shows Up). When I broke attendance down by vote type, a pattern emerged: House members skip contested votes (the close ones, the ones where the lobbying pressure is highest) at rates 4 to 5 percentage points higher than they skip easy votes. Senators show the opposite pattern; they show up more for contested votes than routine ones. What this means is that some House members with low financial signal in the model may not be clean so much as absent. They weren't in the room when the pressure was on, so the model never saw them tested. A legislator who votes 95% of the time and still shows low financial influence has demonstrated something. A legislator who votes 60% of the time and shows low financial influence might just be hiding.¹²

What the model does not tell you is why any of this is true. The causal mechanism remains, and I think will always remain, genuinely ambiguous. Does money change votes, or does money follow ideology? Is the tenure curve evidence of financial legibility fading as ideology features take over, or evidence of donor selection that becomes invisible as it bakes into voting records? When Nunn defects from his party on appropriations and the model says 11.6% of the prediction is financial, does that mean the donations caused the defection, or that Nunn was always inclined toward that position and the donations are a correlate of an inclination the model can't observe directly? The model shrugs. So do I.

But here is the thing that kept me coming back to this project, the thing that I think makes the 6.64% number worth knowing even in the absence of causal certainty: the data is all public. Every dollar, every vote, every ideology score. It's all public. And yet functionally, in terms of any normal person's ability to sit down and connect the money to the votes, it might as well be locked in a vault. The contributions live in one database, the votes in another, the industry classifications in a third,¹³ the lobbying filings in a fourth. The plumbing was the hard part.¹⁴

Whether 6.64% is too much is a question I don't think a model can answer. It's a question about what kind of system you want to live in, and what level of financial influence over democratic decision-making you're willing to accept, and whether a structural pattern that Cicero would recognize and that Mark Hanna professionalized and that Citizens United turbocharged is something you consider a problem or a feature. The model just gives you the number. The interpretation is yours.

The project is called the Congressional Yield Index, which is a pun on "yield" as in bond yield (the return you get on an investment) and "yield" as in what a legislator does when they yield to pressure. I'm unreasonably pleased with this name. For the technically curious, here's the model card as of this writing: ↩
The technical term for this ambiguity in the political science literature is the "selection vs. influence" problem, and it has been the subject of approximately ten thousand published papers, none of which have resolved it. The best summary I've seen is from a researcher who said the field's consensus position is "probably both, in proportions that vary by member and by vote, in ways we cannot measure." Which is, if you think about it, a very expensive way of saying "we don't know." ↩
I realize this metaphor (money as gravity, operating diffusely across many features rather than through a single dramatic mechanism) might sound like I'm trying to make a small number sound bigger than it is. I'm not, or at least I don't think I am, though I acknowledge that the desire to make your own finding sound significant is one of the most reliable biases in all of data science and I am not immune to it. What I am trying to communicate is that the shape of the financial signal (distributed across 107 features, none individually dominant, collectively persistent) tells you something about the nature of what's being measured that the percentage alone does not. A signal concentrated in one or two features would suggest a simple transactional mechanism: donors give money, legislator changes vote. A signal distributed across a hundred features suggests something more like a structural alignment between a member's financial ecosystem and their legislative behavior that no single variable can capture and no single reform can address. ↩
The Senate numbers are noisier because you're working with smaller sample sizes (62 first-term senators vs. 380 first-term House members), so I'd weight the House curve more heavily. The shape, though, is the same in both chambers, which I find suggestive if not conclusive. ↩
There is a running theme in this piece where the model identifies a pattern and then declines to explain it. This is, I've come to believe, the fundamental experience of working with machine learning on social science questions: you get better and better at describing what is happening, and no better at all at understanding why. ↩
I want to note, with the kind of epistemic caution that I suspect reads as hedging but is in fact the actual state of my uncertainty, that the party gap is modest enough that I would not be surprised if it shifted or reversed with different model specifications, different test periods, or different feature engineering choices. 0.88 percentage points is real in the statistical sense (the model sees it consistently) but fragile in the interpretive sense (I would not bet very much money on its direction being stable across contexts, which is itself a somewhat ironic position to hold in a piece about money and stability). ↩
The highest individual money attribution in the entire dataset is 13.61%, belonging to Representative Jonathan L. Jackson of Illinois on the SPEED Act. Jackson is the son of the Reverend Jesse Jackson, a former financial analyst at Drexel Burnham Lambert (where he worked for Michael Milken, which is the kind of biographical detail that a campaign finance model would find delicious if it were capable of finding anything delicious), and a second-term Democrat representing Chicago's South Side. 13.61% is roughly double the average, which means the model thinks an unusually large share of the prediction of his vote comes from financial features. I mention this because it illustrates the range, and because the range itself is the point: the maximum attribution in the dataset is about double the mean. The tails are short. The distribution is compact. Whatever the financial signal is measuring, it is operating with remarkable uniformity rather than concentrating in a few dramatic cases. ↩
The vocabulary changes; the structure persists. The Romans had benignitas. We have "constituent service." The conceptual distinction between influence and corruption is maintained, in every era, by the people who benefit most from its fuzziness. ↩
There's some scholarly debate about whether Hanna actually said this verbatim or whether it's been polished by repetition into a form punchier than the original. This is true of most good political quotes, which exist in a kind of superposition between "historically documented" and "too good to fact-check" until someone with a PhD forces the waveform to collapse. ↩
Buckley v. Valeo (1976) is the cornerstone of the whole structure. The Court ruled that spending money on political campaigns is constitutionally protected speech, but drew a line: you can limit what someone gives directly to a candidate (contributions), because those create a risk of quid pro quo corruption, but you cannot limit what someone spends independently about a candidate (expenditures), because that's speech and the First Amendment has feelings about speech. This distinction (contributions: regulable; expenditures: protected) has governed every subsequent case and is the reason outside spending can grow without limit while direct donations remain capped at figures that a serious donor would consider a rounding error. I'm simplifying, and I know I'm simplifying. Buckley is a genuinely complicated decision with defensible reasoning on multiple sides, and the fact that it has produced a system where you can't give a congressman more than $3,300 but you _can_ spend$ 40 million on ads supporting him as long as you don't "coordinate" (a word doing more foundational work than any word in American law should have to) is something the Court probably did not fully anticipate. Or maybe they did. The opinion is 294 pages long and I have read a portion of it, which I mention primarily to explain why I spent a week in a mood. ↩
If you're looking at the zoomed chart and thinking "McCain-Feingold doesn't seem to have slowed anything down," you're reading it correctly. The law banned soft money donations to national parties, which was the correct thing to do in the way that locking your front door is the correct thing to do when the burglars are already inside. The money, having been politely asked to leave the parties, relocated to 527 organizations (named, with the tax code's characteristic flair for poetry, after the section that exempted them), which could accept unlimited contributions and operated under disclosure requirements best described as optional. Independent spending grew 11× in the eight years after the law took effect. It didn't help that the enforcement body, the Federal Election Commission, responded to the new law by voting in 2004 not to write rules applying it to the very organizations the money was flowing into. McCain himself took to the Senate floor to call the FEC's stonewalling "inexcusable." The Commission eventually fined three of the largest 527s for their activity during the 2004 election, but the fines arrived in 2006 and 2007, after the elections they influenced were over and the outcomes were decided, which is a species of enforcement in roughly the same way that sending a strongly worded letter to a building that has already burned down is a species of firefighting. Campaign finance, it turns out, obeys something like fluid dynamics: block one channel and the pressure finds another. The channel it found was darker, less accountable, and (because the agency tasked with policing it had publicly declined to do so) essentially unregulated. ↩
This is an area where the model's transparency about its own blind spots matters. I'm working on an attendance-adjusted metric that accounts for strategic absenteeism, but it requires distinguishing between "didn't vote because they were traveling" and "didn't vote because they didn't want to be on the record." The former is noise. The latter is signal. Telling them apart from roll-call data alone is, so far, not something I've cracked. ↩
The taxonomy used to connect donors to industries is maintained by OpenSecrets and uses a format of one sector letter plus four digits: A1500 is Dairy, F2100 is Hedge Funds, D1000 is Defense Aerospace. ↩
The full data sources, for anyone who wants to build their own plumbing: Federal Election Commission (fec.gov/data/browse-data/): individual contributions (Schedule A), PAC-to-candidate contributions, committee master file, independent expenditures (Schedule E). Congress.gov API (api.congress.gov): bill metadata, status, subjects, summaries, full text, cosponsors. VoteView (voteview.com): roll-call votes (member-level yea/nay/not voting per rollcall), DW-NOMINATE ideology scores. congress-legislators GitHub (unitedstates/congress-legislators): current and historical legislator YAML (terms, parties, states, districts, dates). Senate SOPR / Lobbying Disclosure Act (lda.senate.gov/api/v1/): LD-1 registrations (lobbying firm, client, lobbyists), LD-2 quarterly reports (client, amount, issue areas, bill references), lobbyist covered positions (former government roles). Interest group scorecards: NRA, LCV, AFL-CIO, Chamber of Commerce, Heritage Action, ACLU, NFIB, and others. ProPublica Nonprofit Explorer (projects.propublica.org/nonprofits/): 501(c)(4) dark money organization identifiers. That's 7 source organizations providing roughly 15 distinct data feeds. All in, about 91GB of raw data. ↩