About a month ago I started using PredictionBook. The idea is that you record how confident you are that certain events will happen, and then later compare how confident you were against whether or not the events actually did or didn’t happen. If the events you had 80% confidence in happen 80% of the time, you’re perfectly calibrated. If they instead happen 90% of the time, you’re underconfident. And if they only happen 70% of the time, you’re overconfident. Neat, huh? PredictionBook lets you record your predictions, later record the outcomes, and aggregates the results into a sweet graph. Check it out:
You may notice that so far all of the events I had 50% confidence in have turned out not to happen. Right now this is a sample size issue and I would expect it to even out over time, but it raised some questions that confused me.
Let’s suppose that over the long run, the things that I assign 50% confidence to consistently happen only 25% of the time. That seems like I must be miscalibrated. I would be tempted to say that I was overconfident at the 50% level. But then that’s confusing because I could have easily phrase the 50% predictions in the other way (“50% chance event A will happen” vs. “50% chance event A will not happen”), in which case I would have reached the conclusion that I was underconfident, not overconfident!
And then when I think about it another way, talking about being calibrated at the 50% level seems meaningless. Let’s suppose that for each time I record “I think there’s a 50% chance that Event A will occur”, I also record the just as true “I think there’s a 50% chance that Event A will not occur”. If I always recorded both of these, then my 50% confidence calibration would be perfect. (When event A does or doesn’t happen, I will have 1 correct prediction and 1 incorrect prediction, so I’ll always have 50% accuracy for my 50% confidence events.) It seems like talking about being underconfident or overconfident at the 50% level is meaningless, because my “underconfidence”/”overconfidence” is just an artifact of me not recording everything. It seems like it’s not even possible to be miscalibrated at the 50% level.
…but then again, if I made a billion 50% confidence predictions and none of them came true it would feel absurd to conclude that I’m not somehow miscalibrated.
Can anyone un-confuse me?