Does More Data Make Vegas More Accurate?

In a previous post we looked at the accuracy of spreads for college basketball games. But one simple critique would be that yes, spreads may not be as accurate as first thought but that is because for a large part of the season Vegas isn’t working with much data. Preseason and Postseason polls are often quite different and teams under and over perform every season.

So does having more data improve the accuracy of spreads? This is relatively easy to determine. We just need to look across seasons and months to determine if the spreads in February and March are more accurate than those in November and December.

Season Nov Dec Jan Feb Mar Season Median
12-13 8.0 6.5 7.0 7.0 7.0  7.0
13-14 7.0 7.0 7.0 6.5 7.0  7.0
14-15 7.0 7.5 7.0 7.5 6.0  7.0
15-16 7.0 7.0 7.0 7.0 7.0  7.0
16-17 7.25 7.0 7.0 7.0 6.5  7.0
17-18 8.0 7.5  7.0  –
5-yr Median 7.5 7.0 7.0 7.0 7.0 7.0

So is there a pattern? It does seem that November has the least accurate spreads, but that is only by a small margin of half a point. And while in some seasons March is the most accurate, in others is falls in line with the season median.

So does more data make Vegas more accurate? A little.

What’s More Likely? Part 1

This is the first in a multi-part series of posts where we look at whether a characteristic about a game has any relationship to which team will cover. In this post, we will explore whether the favorite or the underdog is more likely to cover the spread. To do this, we will work with the data set of college basketball games to determine whether the favorite or underdog is more likely to cover the spread.

First, some high level statistics. In the 22,215 games, the favorite covered 49% of the games, the underdog covered 48.9% of the games and 2.1% of games ended in a push (i.e., the outcome of the game matched the spread and neither team covered). So, neither the favorite or the underdog is more likely to cover the spread all else equal.

Is the Home Team More Likely to Cover the Spread?

No. In the 16,039 games the home team was favored they covered the spread in 8,037 games (50.1%). Conversely, the away underdog covered 8,002 spreads (49.9%). In the 5,713 games the home team was an underdog they covered the spread in 2,821 games (49.4%) and the away favorite covered 2,892 games (50.6%). In 463 games the spread resulted in a push.

So the home team is more likely to be favored but no more likely to cover than an away underdog. And the home underdog is no more likely to cover than the away favorite. So no pattern here…

Does the Size of the Spread Matter?

Another reasonable consideration is the size of the spread. Let’s see if a big favorite is more or less likely to cover the spread?

Putting the size of the spreads into 10 bins, we see no pattern.

  • A 1 point favorite has a 51% chance to cover the spread
  • A 16 point favorite has a 50.4% chance to cover the spread.

So no pattern here either….

Spread Range Favorite Covers Underdog Covers Favorite Covers %
-1 to -1.5 813 778 51.0%
-2 to -2.5 1,045 1,050 49.9%
-3 to -3.5 976 994 49.6%
-4 to -5 1,404 1,401 50.0%
-5.5 to -6 905 858 51.3%
-6.5 to -7.5 1,202 1,214 49.8%
-8 to -9.5 1,249 1,262 49.7%
-10 to -12 1,092 1,132 49.1%
-12.5 to -15.5 1,079 1,054 50.6%
-16 or more 1,135 1,115 50.4%

There are many more factors that could contribute, but from the data available, we can conclude that neither playing at home, on the road, with a large spread or a small spread has any relationship to whether a team covers the spread.

How Likely Was It That A 1 Seed Would Lose?

On March 16, 2018 the University of Maryland Baltimore County (UMBC) beat the University of Virginia 74-54. This was the first time in the history of the NCAA tournament that a number one seed was upset by the number sixteen seed. Prior to the UMBC win, the one seed was 135-0 when playing the sixteen seed in the NCAA Tournament.

While this upset was historic, how likely was it to happen? To understand that, we can look at spreads of games similar to the typical one-sixteen matchup, then see how often the underdog team wins the game. The idea here being if you want to know how likely it is that a 20 point favorite loses, we just need to look at a lot of games where a team was favored by 20 points and see how often the 20 point favorite loses. There is more detail on this idea in this post.

First, we need to understand the range of spreads for a typical one seed vs. sixteen seed game. A small sample will suffice here given the relatively small total population of 136 games. We will use the median* range of spreads in each year to prevent outliers from influencing the analysis.

Year Largest Spread Second Largest Spread Third Largest Spread Smallest Spread Median Spread
2018 Villanova -22.5 Virginia -20.5 Xavier -19.5 Kansas -14 Median Spread -20
2017 North Carolina -26.5 Villanova -25 Gonzaga -23.5 Kansas -23 Median Spread -24
2016 Kansas -24.5 North Carolina -23.5 Virginia -23 Oregon -23 Median Spread -23
2015 Kentucky -35 Duke -22.5 Villanova -22 Wisconsin -20.5 Median Spread -22
4-year Median -25.5 -23 -22.5 -21.5 -22.5
*In the event of a half point, I rounded down to avoid quarter point median spreads.

For this analysis, we will use the 4-year median range so as to avoid outliers like the 2015 Kentucky -35 spread and the 2018 Kansas -14 spread. In the population of 517 games from the data set that fell within the range. The favorite won 506 (97.9%) games and the underdog won 11 (2.1%) games. Given the number of games played prior to the Virginia upset, statistically we’d have expected the 16 seed to have won 2.9 games in the 136 total games played.

So, using historical spread data, a reasonable argument could be made that it was about time a 16 seed won in the NCAA tournament.

How Likely Is The Favorite to Win?

Your team is favored by 8 points. How likely does this make them to win?

In college basketball, there is a 77% chance a team favored by 8 points will win. In fact, each point in the spread increases the chance of winning by about 3.5%. A good rule of thumb is that 50%+(3.5 x spread) will get you within 1-2% of the chance of the favorite winning. However, this rule only applies up to about 11 point spreads at which point each additional point added to the spread is worth significantly less than 3.5% additional chance of winning.

Using 22,215 college basketball games played from November 2012 to January 2018, the actual data is displayed below in table format.

Spread Games Percentage of Games Won by Favorite
1 703 52.7%
1.5 913 54.7%
2 1,010 55.3%
2.5 1,117 56.9%
3 1,002 60.6%
3.5 1,025 60.0%
4 995 62.3%
4.5 999 66.4%
5 892 66.8%
5.5 960 71.1%
6 848 69.8%
6.5 860 70.3%
7 781 76.6%
7.5 804 77.1%
8 710 77.0%
8.5 678 80.3%
9 577 81.1%
9.5 599 84.8%
10 540 82.2%
10.5 499 84.9%
11 452 85.8%
11.5 416 89.6%
12 378 89.1%
12.5 399 83.9%
13 362 89.7%
13.5 330 90%
14 301 92.3%
14.5 279 92.4%
15 258 95.7%
15.5 242 96.6%
16 193 94.8%
16.5 191 95.2%
17 191 93.1%
17.5 128 96.8%
18 153 98.0%
18.5 123 95.1%
19 117 98.2%
19.5 101 99.0%
20 or more 1,089 98.5%

Here is the table data in chart form. As you can see, a relatively straight line until 20 points at which point the n-size per spread drops below 100 and the chance of winning stays relatively flat at 98%-99%.

View the table data in chart form.

Notes

  • The highest spread with an underdog winner was 25.5. Interestingly, 3 teams won with that spread.
  • Spreads above 20 are rare, accounting for less than 5% of all games with a spread.

How Accurate Are Vegas Lines?

I’ve heard it many times…Vegas knows! But I wondered. How good are the lines?

I was unhappy with what I could find on the internet so I decided to look into it. To start, I defined what I wanted to know. How accurate are Vegas sports betting lines? To be more specific, I wanted to know the average difference between the line Vegas set for a game and the actual outcome. For example, New England was 4.5 point favorites over Philadelphia in Super Bowl LII. Philadelphia won the game 41-33. So in this example Vegas was 12.5 points off on setting the spread (Philadelphia was expected to lose by 4.5 points and won by 8 points).

But is this typical? In Super Bowl LI New England was 3 point favorites over Atlanta and won the game by 6 points. So Vegas was off by only 3 points in this example and pretty accurately predicted the outcome of the game.

In this instance, we need to look at a larger sample of games. For this analysis I choose college basketball for a few reasons.

  • College basketball is a mainstream sport which means many people want to place bets on it which is one check on accuracy.
  • College basketball has a lot of games, which means larger populations to analyze. There are 256 NFL games a year (32 teams play 16 games) while in college basketball there may be 4,000 games in a season with a spread. Larger data sets allow for different types of analysis.

For purposes of analyzing the data, the metric Absolute Difference from Spread (ADS) was calculated for every game in the data set. For 22,215 college basketball games in the population, the median ADS was 7 and the mean ADS was 8.6. To state this another way, half of college basketball games are within 7 points of the spread and half are greater than 7 points. 

View the data in chart form.

Is this good? We will explore this in a later post.

Data

  • The data used for this analysis is 22,215 college basketball games played between November 9, 2012 and January 31, 2018.
  • Absolute Difference from Spread (ADS) is the difference between the predicted outcome of the game and the actual outcome of the game measured using spread. For example, if Team A is favored by 15 and wins by 13, the ADS is 2. However, if Team A is favored by 5 and loses by 2 the ADS is 7.
  • The team with the largest ADS was Pittsburgh which on January 24, 2017 was 5 point underdogs to Louisville yet lost the game by 55 points. By the ADS measure, this was the most lopsided game in the last five years.