And before I continue, standard Nate Silver disclaimer: Nate Silver says a lot of stupid shit, but he's still very good at statistics and data analysis, and at explaining those things in easy-to-understand terms. He is a valid source to cite in his area of expertise! And pretty much nowhere else.
Anyway, Silver's got an article up today titled Will The Polls Overestimate Democrats Again? The title's a little clickbaity (tl;dr maybe but there's no good data-driven reason to assume so) but the article is good and thorough.
Here's a bit I found interesting:
As I mentioned, the Deluxe version of our forecast gives Democrats a 71 percent and 29 percent chance of keeping the Senate and House, respectively. But the Deluxe forecast isn’t just based on polls: It incorporates the fundamentals I mentioned earlier, along with expert ratings about these races. Furthermore, it accounts for the historical tendency of the president’s party to perform poorly at the midterms, President Biden’s mediocre (although improving) approval rating and the fact that Democrats may not perform as well in polls of likely voters as among registered voters. As the election approaches, it tends to put more weight on the polls and less on these other factors, but it never zeros them out completely. (In this respect, it differs from our presidential forecast.)
By contrast, the Lite version of our forecast, which is more or less a “polls-only” view of the race, gives Democrats an 81 percent chance of keeping the Senate and a 41 percent chance of keeping the House. It also suggests that they’ll win somewhat more seats: There are 52.4 Democratic Senate seats in an average Lite simulation as compared with 50.8 in a Deluxe simulation, or 212 Democratic House seats in an average Lite simulation versus 209 in a Deluxe simulation. Notably, this corresponds to current polls overstating Democrats’ position by the equivalent of 1.5 or 2 percentage points. Put another way, we should think of a race in which the polling average shows Democrats 2 points ahead as being tied.
(links not included in copy-paste)
Those gaps are pretty massive -- the polls-only forecast shows Democrats 10 points more likely to keep the Senate, and 12 points more likely to keep the House, than the "Deluxe" forecast, which considers other factors and the correlations they've had with historical outcomes.
The "Deluxe" forecast is a good model -- remember when people were pointing and laughing at Silver in 2016 because his model gave Trump a 30% chance of winning and obviously if you looked at the polls there was no way it was that high? -- but, by definition, its reliance on historical data means it's going to be accurate most of the time but is ill-prepared for dealing with major unexpected events.
Polls aren't perfect, but they factor in people's reactions to Dobbs. Historical data doesn't.
It's not that Biden's approval rating and the economy and this being a midterm and the election still being two months away don't matter; they do. But a model that assumes they have the same weights relative to reproductive rights that they have historically is missing key context that's not easy to model statistically. I think this time, the polls might be the more reliable indicator of how the election is going to go, even though it's still two months out, and I think they may still be underrating Democrats' chances.
The evidence for that view isn't really statistically significant yet, and as always I could be wrong. But merely asking the question of which statistical model is closer to what the final result will be highlights a lot of what polling analysis is about -- it's not just a question of "What does the data say?" but "Which data?" and "Why might it be biased or just plain wrong?"