Tuesday, March 03, 2009

Justin Domke's rankings on predictive power

between Hubdub, Intrade, and Nate Silver on the 2009 Oscars (via Chris Masse):
There has been some discussion lately about how to evaluate the performance of different prediction markets (like Intrade), and predictors (like Nate Silver) at guessing the winners of elections, or Oscars. Who is making the best predictions? If everyone simply made a guess for the winner of each state or award, we could evaluate performance easily: whoever guesses the most outcomes correctly is making the best predictions. But what do we do if the predictors provide us full probabilities of the different outcomes? Intuitively, someone who gives 99% probability of an event that doesn’t occur is much “more wrong” than someone who gives only a 51% probability.
...
Let us think about this situation from the perspective of “bent coin predictors”. Let’s say we have a pool of 100 bent coins, each of which has some unknown probability p_c(\text{heads}) of ending up heads. We have a number of people who reckon they can estimate that probability by looking at the coin. Denote the prediction of guesser i for coin c by g_{i,c}(\text{heads}) After predictions have been made, we flip all the coins. Now, how do we find the best guesser?
...
One reasonable way to measure the quality of the guess would be the sum of squares difference
...
We only have a single result from flipping each coin. The central idea here is that we can use what is known as a Monte-Carlo approximation.
...
Now, suppose that we don’t know p, but we can simulate p. That is, by running some sort of experiment, we can get some random value x_n, whose probability is p(x).
...
As an example, suppose we want to know the average amount that a slot machine pays out. We could approximate this by playing the machine 1000 times, and calculating the average observed payout.
...

Now, let’s apply the above theory to the Oscar predictions.

Taking a zero for the empty entries, and normalizing each prediction, we obtain the scores:

Quadratic loss:

538:     -0.6235
intrade: -0.7925

Remember, less is better, so this is a clear win for intrade.

...
Well, to compute the quadratic loss, I would actually need to know how Hubdub apportioned probability between the different losing entries in each category. I can calculate the conditional likelihood loss, though, and at 0.2237, Hubdub beats both intrade and 538.

No comments:

Post a Comment