About a month ago I started using PredictionBook. The idea is that you record how confident you are that certain events will happen, and then later compare how confident you were against whether or not the events actually did or didn’t happen. If the events you had 80% confidence in happen 80% of the time, you’re perfectly calibrated. If they instead happen 90% of the time, you’re underconfident. And if they only happen 70% of the time, you’re overconfident. Neat, huh? PredictionBook lets you record your predictions, later record the outcomes, and aggregates the results into a sweet graph. Check it out:

You may notice that so far all of the events I had 50% confidence in have turned out not to happen. Right now this is a sample size issue and I would expect it to even out over time, but it raised some questions that confused me.

Let’s suppose that over the long run, the things that I assign 50% confidence to consistently happen only 25% of the time. That seems like I *must* be miscalibrated. I would be tempted to say that I was overconfident at the 50% level. But then that’s confusing because I could have easily phrase the 50% predictions in the other way (“50% chance event A will happen” vs. “50% chance event A will **not** happen”), in which case I would have reached the conclusion that I was underconfident, not overconfident!

And then when I think about it another way, talking about being calibrated at the 50% level seems meaningless. Let’s suppose that for each time I record “I think there’s a 50% chance that Event A will occur”, I also record the just as true “I think there’s a 50% chance that Event A will **not** occur”. If I always recorded both of these, then my 50% confidence calibration would be perfect. (When event A does or doesn’t happen, I will have 1 correct prediction and 1 incorrect prediction, so I’ll always have 50% accuracy for my 50% confidence events.) It seems like talking about being underconfident or overconfident at the 50% level is meaningless, because my “underconfidence”/”overconfidence” is just an artifact of me not recording everything. It seems like it’s not even possible to be miscalibrated at the 50% level.

…but then again, if I made a billion 50% confidence predictions and none of them came true it would feel absurd to conclude that I’m not somehow miscalibrated.

Can anyone un-confuse me?

### Like this:

Like Loading...

*Related*

The over-/under-confidence concept is generalized from examples where it makes sense – it comes from observations, not from theory. But if you ask me, the correct generalization is as follows:

An “update” is a pair of probability distributions (a prior and a posterior), and properly speaking it’s only an update that can be under- or over-confident, or more properly, an update-generating mechanism (eg a person) on a class of questions.

For example, if you’re doing PredictionBook, then the sort of question you might be making yes/no predictions about might be the sort where 50-50 is a good prior, and that’s a good default – but someone might mostly like questions of the form “will we have tech X by year Y?”, where the reasonable default prior might not be 50%; if this is true, it needs to be taken into account. The easiest way is to do what you said: throw in the negation for each question, so that the answer to a random question will for sure be Yes half the time.

And then under- or over-confidence just means the predictions would be systematically better if a correction was applied, in which the update was strengthened/weakened.

In a case where the prior was 50% and you said 50%, the update was 0; when there’s no information, strengthening or weakening still leaves it at 0, so just as you said, being under- or overconfident is meaningless.

But there’s other ways for the under-/over-confidence concept to fail to capture what’s wrong with your predictions. One is as you said: if you keep saying 50% but the answer is always No, then while your update is 0 (ie you act as though you had no information on top of your prior), it seems like there’s a pattern you’re failing to learn – you’re remaining ignorant, which is orthogonal to calibration. Another would be eg if you kept saying 60% but the answer came out Yes 30% of the time, and your prior was 50%; in this case you’re being more than overconfident, your information is systematically wrong.

I hope this decreases rather than increases confusion! 🙂

LikeLiked by 1 person

Thanks, that’s helpful!

LikeLike