NCAA

Rosenthal: A statistical ranking of NCAA basketball teams

{eot}
Jeffrey Rosenthal, Special to TSN and TSN.ca
3/18/2013 2:57:14 PM
Decrease Text SizeIncrease Text Size
Text Size

I was asked by TSN to make predictions for the 2013 NCAA Men's Basketball "March Madness" tournament bracket based solely on a statistical analysis, without using any specific knowledge of NCAA teams (which is just as well since, although I like sports and watch them sometimes and even play a bit of neighbourhood pick-up basketball myself, I haven't closely followed any spectator sports in years).

So I proceeded by:

a) Gathering lots of different data variables for each team, for each of the past four regular seasons.

b) Separately gathering the results of each game of each of the past three years' March Madness tournaments.

c) Combining all of that data together for my computer programs to read (which turned out to be very time-consuming, since different data are available on different web sites in different formats with different team name abbreviations, so I had to "teach" my computer to match them all up).

d) Exploring different "non-negative linear combinations" of the data, i.e. formulas which use the data from a given regular season, to give an overall score to each team (I use the phrase "regular season" to include all games from that season prior to the NCAA March Madness tournament, including conference tournament games).

e) Developing computer programs to "fit" the formula based on previous seasons, i.e. to do an extensive search to figure out which of those formulas did the best job of predicting the winners for each game in that year's tournament, using data from the corresponding regular season.

f) Eventually coming up with a single best formula for this, which I call the "Rosenthal Fit."

g) Then, filling in the actual bracket simply by picking, for each game, whichever team has a larger value of their Rosenthal Fit.

The formula for the Rosenthal Fit, plus an evaluation of how well it performed when applied to data from the previous three years' tournaments, is provided below. Corresponding values for all teams for the 2012-2013 regular season (to be used to predict the 2013 tournament bracket) are listed just below:

General Observations:

The NCAA tournament is inherently hard to predict. Indeed, the total number of different ways of filling in your bracket predictions is 2^63 (i.e., 63 different 2's all multiplied together), which works out to about 9 x 10 to the 18th, i.e. a nine followed by 18 zeros, which equals nine billion billion, or nine million million million. That's a lot of possibilities!

In fact, even the experts find it challenging. For example, in past tournament games, the higher-seeded team only won about 70 per cent of the games. This means that even when many of the most knowledgeable people get together to seed the teams, they can still only correctly predict the winner about 70 per cent of the time. Individual expert basketball predictors (e.g. Kem Pomeroy at KenPom.com) tend to perform similarly, accurately predicting the winner in only about 70 per cent of the tournament games. Part of the reason is that each matchup is a single-elimination game, rather than e.g. a seven-game series, so there is lots of inherent day-to-day randomness, and it is quite possible for a weaker team to beat a "better" team in any one game, making predictions that much more difficult.

So, despite my extensive computer programming and statistical modeling, I do not expect to do better than calling about 70 per cent of the games correctly.

Indeed, I would say that anyone who does much better than 70 per cent would have to get fairly lucky (in addition to perhaps having a good predictive model and/or good knowledge of the basketball teams).

Statistical Data Considered:

To perform my statistical analysis, I downloaded and considered lots of different statistics, including the following (listed with sources):

- WinFrac: The team's overall game-winning fraction for the entire regular (pre-March Madness) season. (teamrankings.com)

- WinFrac3: The team's game-winning fraction in their final three regular season games. (teamrankings.com)

- CWinFrac: The team's game-winning fraction for games within their own conference. (realtimerpi.com)

- NCWinFrac: The team's game-winning fraction for games outside of their own conference. (realtimerpi.com)

- AdOff: The team's "adjusted" offensive efficiency rating. (KenPom.com)

- AdDef: The team's "adjusted" defensive effiiency rating. (KenPom.com)

- OffEff: The team's unadjusted offensive effiiency rating. (teamrankings.com)

- DefEff: The team's unadjusted offensive effiiency rating. (teamrankings.com)

- SOS: The team's "Strength of Schedule", a measure of the average strength of the opponents they played. (realtimerpi.com)

- RPI: The team's "Ratings Percentage Index". (realtimerpi.com)

- PntPG: The team's average number of points scored per game. (teamrankings.com)

- OpPnt: The team's average number of points scored against them per game. (teamrankings.com)

- I also examined the team statistics provided at ncaa.com and at espn.go.com, but they largely overlapped with the above statistics, so in the end I did not need to use them directly.

Finally, and most importantly, the "outcome" measure was:

- TourRes: The game-by-game, line-by-line win/loss results for each game of each of the past three March Madness tournaments. (kusports.com)

Statistical Modeling Approach Taken:

My approach was to try to figure out which linear combination of (i.e., formula using) the above-listed regular-season statistical values would do the best job of ranking the teams from highest to lowest, in terms of who won which games in the corresponding year's tournament. I computed this using regular-season statistical values, and corresponding tournament game results, for each of the three seasons 2009-2010, 2010-2011, and 2011-2012.

To perform this computation, I wrote computer programs in C and in R, which used such techniques as "linear regression", "constrained linear regression," and finally a "Monte Carlo (randomised) search algorithm," to find an optimal formula.

Although my computer programs considered all of the above variables, they ultimately selected just a few of those variables as being most relevant for prediction, namely: WinFrac, WinFrac3, OffEff, DefEff, SOS, and NCWinFrac.

Final Formula:

Using the above statistical analysis, the resulting best linear combination turned out to be:

Rosenthal Fit = 6:2337 x WinFrac + 1:7180 x WinFrac3 +1:1179 x OffEff + 1:9189 x DefEff + 11:9846 x SOS + 7:3712 x NCWinFrac

I then applied this linear combination formula to the regular-season statistics for the current (2012-2013) season. This provided an overall numerical rating for each team this year, based on their regular-season statistics. These ratings are listed, in order from highest to lowest below.

Then, to fill out this year's tournament bracket using this Rosenthal Fit, simply choose, for each game, whichever team has a higher value of the Rosenthal Fit.

Note: The above rating system is based purely on statistical analysis, without taking any other factors into account. Certain late-breaking events (e.g. Kentucky Wildcats superstar Nerlens Noel's major injury on February 12) could potentially have a large impact on a team's tournament performance despite making only small changes to their regular-season statistics, which could throw off my model's predictions. I did consider making a few post-hoc adjustments to account for such developments, but in the end I decided not to - thus keeping the Rosenthal Fit as a purely statistical measure.

Comparison to Other Predictors:

The following table shows how the Rosenthal Fit, and also the tournament seedings, and also the RPI (Ratings Percentage Index) itself, would have done at predicting tournament games in each of the past three tournaments. (In two of the tournaments, there was one game between two equally-seeded teams; those two games are excluded from the evaluation of the tournament seedings)

 

Season Seedings RPI RF
2009-2010 42/62 (67.74%) 44/63 (69.84%) 48/63 (76.19%)
2010-2011 43/63 (68.25%) 38/63 (60.32%) 43/63 (68.25%)
2011-2012 46/62 (74.19%) 44/63 (69.84%) 45/63 (71.43%)
Total 131/187 (70.05%) 126/189 (66.67%) 136/189 (71.96%)

 

This table shows that the Rosenthal Fit compares favourably with RPI and with the tournament seedings. This should not be taken as evidence of any particular superiority, since the Rosenthal Fit was developed precisely to try to maximise these predictions. Still, it does suggest that the Rosenthal Fit is at least roughly comparable in predictive power to these expert measures.

In a few weeks, we will know how well it performed this year.

Jeffrey Rosenthal is a professor in the Department of Statistics at the University of Toronto, and the author of the bestseller Struck by Lightning: The Curious World of Probabilities. His analysis can seen during TSN's coverage of the 2013 NCAA Men's Basketball tournament.

List of Rosenthal Fit Values:

                          Duke    24.1150
                    Louisville    23.7559
                        Kansas    23.6584
                    New Mexico    23.5325
                       Gonzaga    23.4355
                       Arizona    23.2148
                       Indiana    23.0785
                      Michigan    22.6300
                      Ohio St.    22.6260
                    Georgetown    22.5934
                      Syracuse    22.5526
                     Creighton    22.5324
                    Miami (FL)    22.3322
                    Notre Dame    22.2744
                    Pittsburgh    22.1597
                       Memphis    22.1042
                   Wichita St.    22.0946
                   Saint Louis    22.0907
                       Florida    22.0731
                  Michigan St.    22.0105
                        Butler    21.9748
                    Kansas St.    21.9461
                        Oregon    21.9407
                  Colorado St.    21.8670
                   Mississippi    21.8169
                          UNLV    21.7975
                    Cincinnati    21.7373
                    N.C. State    21.7080
                           VCU    21.6183
                      Bucknell    21.5939
                  Oklahoma St.    21.5885
                    St. Mary's    21.5479
                      Illinois    21.3910
                      Maryland    21.3721
                       Belmont    21.3090
                          UCLA    21.3080
                     Marquette    21.2605
                        Temple    21.2184
                North Carolina    21.1325
                       Wyoming    21.0634
                     Wisconsin    20.9743
                      Missouri    20.8896
                     Charlotte    20.8322
                     Minnesota    20.8182
               Middle Tenn.St.    20.8046
                       IowaSt.    20.8036
                    Valparaiso    20.7961
                 San Diego St.    20.6748
                   Connecticut    20.6519
                          Iowa    20.6125
                      Colorado    20.5972
                   Boise State    20.5151
                        Albany    20.3990
                      Utah St.    20.3426
                         Akron    20.3190
                 Southern Miss    20.2688
                       LaSalle    20.1715
                   Arizona St.    20.0918
                      Oklahoma    19.9951
                       Rutgers    19.8699
                           LSU    19.7374
                     Tennessee    19.5588
                     Villanova    19.5010
                       Houston    19.4979
                      Virginia    19.4679
                      Stanford    19.4496
                   Santa Clara    19.4331
                      Kentucky    19.3383
                 Brigham Young    19.3114
                        Lehigh    19.2614
                    Seton Hall    19.2364
                     Texas A&M    19.2074
                    California    19.1917
                   Stony Brook    19.1861
                  Georgia Tech    19.0646
                          Ohio    19.0342
                New Mexico St.    18.9641
                   Florida St.    18.8859
                  S Dakota St.    18.8602
                      Arkansas    18.8197
                      Davidson    18.7817
                        Baylor    18.7774
                       Alabama    18.7748
                        Dayton    18.7484
                  Fla Gulf Cst    18.7107
                        Tulane    18.6753
                   Loyola (MD)    18.6450
                         Texas    18.6347
                    Murray St.    18.6279
                      Richmond    18.6116
                   Rob. Morris    18.5161
                    Providence    18.4669
                      Nebraska    18.4523
                      AirForce    18.4451
                          Iona    18.4391
                  Illinois St.    18.3915
                       Vermont    18.3840
                    Oregon St.    18.3567
                 South Florida    18.3112
                   Indiana St.    18.3080
                    Washington    18.2090
                    Evansville    18.2070
                       Harvard    18.1508
                        Bryant    17.9622
                        Denver    17.8817
                    TX El Paso    17.8263
                        Xavier    17.7947
                   W. Kentucky    17.7828
                          Utah    17.7690
                    St. John's    17.7554
                      Canisius    17.6712
                        Wagner    17.6241
                     Fairfield    17.5919
                         Tulsa    17.5297
                       Montana    17.4721
                       Pacific    17.4308
                    Vanderbilt    17.3922
                  Arkansas St.    17.3845
                      Penn St.    17.3180
                 Northern Iowa    17.3111
                  Northwestern    17.2556
                   Long Island    17.2556
                 James Madison    17.2510
                       Detroit    17.2379
                  George Mason    17.2111
                       Bradley    17.0855
                   Loyola (IL)    17.0722
                          Elon    17.0680
               St. Bonaventure    17.0655
                        Mercer    17.0336
                         Drake    17.0289
                      NW State    17.0187
                   Wake Forest    17.0182
                       Niagara    16.9581
                        Purdue    16.9563
                      Hartford    16.9487
                    Texas Tech    16.9233
                      Boston U    16.8685
                         Rider    16.8067
                       Clemson    16.7166
                       De Paul    16.6454
                        Nevada    16.5988
                     Princeton    16.5938
                           UAB    16.5054
                     UC Irvine    16.5046
                      Delaware    16.4777
                        Towson    16.4171
                       Georgia    16.3679
                     Lafayette    16.3253
                 West Virginia    16.2019
                     San Diego    16.1158
                        NC A&T    16.1027
                      Southern    16.0950
                        Toledo    16.0701
                        Hawaii    16.0292
                      Cal Poly    15.8982
                         Idaho    15.8592
                 Cleveland St.    15.7620
                          IPFW    15.7000
                  Savannah St.    15.6405
                    Fresno St.    15.6242
                    Pepperdine    15.6083
                   Norfolk St.    15.5815
                    Holy Cross    15.5070
                      Marshall    15.4374
                          Army    15.3794
                  Oral Roberts    15.3730
                           USC    15.3022
                Sam Houston St.    15.2898
                          Yale    15.1663
                      Winthrop    15.1356
                  Morehead St.    15.0979
                         Brown    15.0842
                        Drexel    15.0668
                TX San Antonio    15.0024
                       Oakland    14.9904
                   McNeese St.    14.9467
                    Quinnipiac    14.9358
                   North Texas    14.8990
                      Duquesne    14.8985
                          Troy    14.8513
                    Morgan St.    14.7504
                   Georgia St.    14.7192
                  LA Lafayette    14.7140
                      Lipscomb    14.7121
                Long Beach St.    14.7059
                     Manhattan    14.6780
                      UC Davis    14.5437
                      Columbia    14.5091
                   St. Peter's    14.4304
                    High Point    14.3977
                        Auburn    14.3659
                        Marist    14.3493
                       Wofford    14.3461
                   San Jose St.    14.3070
                       Cornell    14.2636
                       Buffalo    14.2271
                  Rhode Island    14.1902
                       Liberty    14.0328
                      Portland    13.9293
                  Delaware St.    13.7218
                    Miami (OH)    13.6686
                  South Dakota    13.6241
                       Stetson    13.5838
                       Fordham    13.5698
                N.C. Asheville    13.5688
                          UCSB    13.5529
                      Campbell    13.4454
                       Colgate    13.4360
                  North Dakota    13.4358
                      Monmouth    13.3985
                   Chattanooga    13.3883
                     Dartmouth    13.2551
                         Maine    13.1639
                       Seattle    13.0385
                       Radford    12.9002
                   Montana St.    12.8383
                  Jacksonville    12.8043
                         Siena    12.7232
                       Hampton    12.7056
                          Navy    12.4556
                   Chicago St.    12.3891
                  SE Louisiana    12.2742
                   N. Colorado    12.1435
                   Jackson St.    12.1361
                   Austin Peay    12.0914
                          Rice    11.8819
                  E. Tenn. St.    11.8395
                  Old Dominion    11.7348
                  Nicholls St.    11.6002
                         IUPUI    11.5430
                     LA Monroe    11.2691
                       Samford    11.2131
                       Citadel    11.1936
                  Portland St.    11.1429
                        Howard    11.0323
                       Hofstra    11.0204
                   Alabama St.    10.9835
                      Longwood    10.7365
                        Furman    10.6795
                  Presbyterian    10.5587
                   New Orleans    10.4705
                         Lamar    10.2693
                   Florida A&M    10.0584
                  UC Riverside     9.9920
                  Kennesaw St.     9.7770
                    Binghamton     9.6115
                  Ste F Austin     9.1443
                   Weber State     9.1435
                 Col Charlestn     8.5979
                  N Dakota St.     8.4912
                    W Illinois     8.4275
                         UMass     8.2533
                    NC Central     8.1872
                    E Kentucky     8.1479
                   TX Southern     8.1355
                    W Michigan     7.9282
                    Kent State     7.8832
                   Ark Pine Bl     7.8599
                  Wright State     7.8351
                       LA Tech     7.8212
                     Gard-Webb     7.7709
                  Mt St.Mary's     7.7613
                 Jksnville St.     7.7428
                   Charl South     7.6965
                    E Carolina     7.6424
                  TX-Arlington     7.6413
                   Northeastrn     7.6071
                  Florida Intl     7.5889
                      TN State     7.5248
                    Central FL     7.4869
                     WI-GrnBay     7.4113
                    Boston Col     7.3077
                   SE Missouri     7.3018
                    St Josephs     7.2322
                   AR Lit Rock     7.2321
                    Ball State     7.2072
                  CS Bakersfld     7.0927
                     S Alabama     7.0577
                   San Fransco     7.0325
                     App State     7.0239
                    SC Upstate     7.0083
                    S Illinois     6.8708
                   VA Military     6.7742
                      TX-PanAm     6.7519
                  Fla Atlantic     6.7064
                   Central Ark     6.6817
                    Wash State     6.6623
                    IL-Chicago     6.6607
                    N Kentucky     6.6580
                    W Carolina     6.6478
                    Youngs St.     6.6009
                    E Michigan     6.5514
                       TN Tech     6.4862
                     Beth-Cook     6.4780
                    E Illinois     6.4418
                         N JIT     6.4057
                  Central Conn     6.3871
                  Prairie View     6.3793
                     Sac State     6.3764
                   Houston Bap     6.3659
                   S Methodist     6.3245
                     Wm & Mary     6.3052
                    S Carolina     6.2965
                  Cal St Nrdge     6.2900
                   Texas State     6.2669
                  St Fran (NY)     6.1695
                   Coastal Car     6.1570
                    Geo Wshgtn     6.1219
                   Loyola Mymt     6.0941
                     N Florida     6.0594
                  Missouri St.     6.0032
                     Neb Omaha     5.9981
                   GA Southern     5.9894
                    Miss State     5.9503
                  Utah Val St.     5.8729
                  Central Mich     5.8298
                   Bowling Grn     5.7696
                  CS Fullerton     5.6973
                   E Washingtn     5.6854
                       VA Tech     5.6753
                   Maryland BC     5.5945
                  TX Christian     5.4742
                      Alab A&M     5.4483
                  Coppin State     5.4384
                        U Penn     5.2858
                     TN Martin     5.2321
                     N Arizona     5.2213
                   N Hampshire     5.1888
                   NC-Grnsboro     5.1397
                      American     5.1245
                  Alcorn State     5.0936
                    Sacred Hrt     5.0929
                          UMKC     5.0428
                   NC-Wilmgton     5.0005
                        S Utah     4.9083
                    WI-Milwkee     4.8219
                  St Fran (PA)     4.7744
                     TX A&M-CC     4.7630
                    SIU Edward     4.7194
                   Idaho State     4.6572
                  Miss Val St.     4.6007
                   F Dickinson     4.5634
                   S Car State     4.4436
                    N Illinois     3.6664
                   Maryland ES     3.3431
                 Grambling St.     3.0220
Kelly Olynyk (Photo: David Becker/Getty Images)

zoom

(Photo: David Becker/Getty Images)
Share This

Share This

Add to FacebookAdd to DiggAdd to FarkAdd to TwitterAdd to Stumble UponAdd to Reddit
Print this Story


NCAA Basketball Men's Final Four

The Connecticut Huskies surprised the field from the No. 7 seed to capture their fourth national championship men's title.


Full 2014 Tournament Bracket


Take The Tournament Challenge!

Heisman Watch

Florida State quarterback Jameis Winston won the Heisman Trophy, becoming the youngest player to win college football's most prestigious individual honour. More...


Heisman Trophy Winners

© 2014
All rights reserved.
Bell Media Television