A Closer look at the NFL Draft¶

By: Siddharaj Vaghela and Sai Pesari¶

Group Contribution Statement: Both members of the group got together on video calls and worked through the coding and writing.

Introduction¶

One of the biggest events of the year in the National Football League (NFL) is the annual draft. In the NFL Draft, teams choose top collegiate players in the nation to join their team. There are 7 rounds in the draft with 32 picks per round. More information on the NFL Draft can be found here. https://operations.nfl.com/the-players/the-nfl-draft/the-rules-of-the-draft/

The wide receiver position is one of the most important, and it's a position that teams want to fill up the most. Within this project, we will be trying to analyze the NFL draft from the wide receiver perspective. We will examine the association between factors such as the combined performance of the wide receiver, their draft position, their rookie year performance, as well as the creation of our own model to predict the round at which the receiver was drawn up and then evaluate the accuracy of our model.

Here is a link to a wikipedia page explaining what the NFL combine is: https://en.wikipedia.org/wiki/NFL_Scouting_Combine

Curing, Parsing and Handling of Data¶

We started off by reading in the CSV file that contains player data. The dataset includes entities which are individual players, and attributes include the round a player was drafted, the pick the player was drafted, height, weight, 40 yard dash, shuttle run, and various other combine statistics. The dataset as a whole contains draft data for every player from 2000-2018. We will be focusing on drafted (AKA excluding undrafted) players drafted in the year 2008 to 2017.

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
import statsmodels.api as sm
from statsmodels.formula.api import ols

df = pd.read_csv("combine_data_since_2000_PROCESSED_2018-04-26.csv")
wr_filter = df["Pos"] == "WR"
df_wr = df[wr_filter]
year_filter = df["Year"] >= 2008
df_wr = df_wr[year_filter]
df_wr = df_wr[df_wr['Round'].notna()]
df_wr

<ipython-input-2-1834d67d7a39>:5: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  df_wr = df_wr[year_filter]

We now have pre-draft data about drafted wide receivers from 2008-2017. From here, we can now gather data from the rookie seasons of each of these players(the season immediately after they were drafted). This data will enable us to analyze correlations between draft position and immediate perfmance in the NFL. To get this data, we need to scrape it from a website called Pro Football Reference. The data will be in the form of a table, and we will need to scrape 9 years worth of this data(2008-2017). We created a for loop that scrapes data from the pro football reference website and parses it to our needs, which onle includes the wide recevier data.

import requests
from bs4 import BeautifulSoup


wr_data = pd.DataFrame()
for year in range(2008,2018):
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36'}
    url = "https://www.pro-football-reference.com/years/" + str(year) + "/receiving.htm"
    r = requests.get(url, headers=headers)
    parse = BeautifulSoup(r.content, "html.parser")
    dat = parse.find("table")
    dat_str = str(dat)
    dat_dat = pd.read_html(dat_str)
    dat_dat = dat_dat[0]
    dat_dat["Season"] = year
    dat_dat = dat_dat.loc[(dat_dat['Pos']=="WR") | (dat_dat['Pos'] == "wr")]
    wr_data = wr_data.append(dat_dat)
    
wr_data.head()

Now we'll join the tables so we can correlate rookie season stats with draft position.

pd.set_option('display.max_columns', 500)
rookie_stats = pd.merge(wr_data, df_wr, on='Player')
year2_filter = rookie_stats["Season"] == rookie_stats["Year"]
rookie_stats = rookie_stats[year2_filter]
rookie_stats

Exploratory Data Analysis¶

There are many ways to measure performance for wide receivers. For more information on how performance is measured in the NFL for wide receivers, feel free to click on the following link!

https://www.footballoutsiders.com/stats/nfl/wr/2019

Some of the most common ways of measuring a wide receiver’s success are receiving yards and touchdowns, which is what we will be using to measure a receiver’s production. First, let’s look at the relationship between average receiving yards and the round a receiver was selected. The graph below shows us the average receiving yards in a wide receiver’s rookie year plotted against the round that wide receiver was taken in, aggregated from 2008-2017. As we can see below, there does seem to be a trend. On average, the later the round a wide receiver was taken in, the less yards per game the player averaged in his first season. According to the graph, for every round deeper into the draft, the average yards per game goes down, except for round 5 which seems to be an outlier.

This trend can be attributed to the fact that players that are taken earlier in the draft are perceived to have more skill and potential, which results in more playing time during their rookie season and better performance if their skills hold true to the test. Additionally, late round players may not have the same skill, and sometimes lack the trust of coaches to the point where they do not get very much playing time their first year.

rookie_stats["Pick_Per_Round"] = rookie_stats["Pick"] % 32
rookie_stats = rookie_stats.astype({"Y/G": float, "GS": float, "TD": float})
rookie_stats = rookie_stats.rename(columns={"Y/G": "Yards_Per_Game", "GS": "Games_Started"})
yards_per_round = rookie_stats.groupby("Round").agg(("Yards_Per_Game")).mean()

ypr = pd.DataFrame(yards_per_round)
ypr = ypr.reset_index()

ypr.plot.scatter(x = 'Round', y = 'Yards_Per_Game', figsize=(12, 10))
sns.regplot(x = 'Round', y = 'Yards_Per_Game', data = ypr)

<matplotlib.axes._subplots.AxesSubplot at 0x7f94179d6640>

Now, we try to compare how different players within the same round performed. Below are 6 different plots, each corresponding to its respective round. In each plot, we have plotted average yards per game vs the pick a player was drafted at. We can see that usually this resulted in a horizontal graph, meaning that the position within a round didn’t matter. This means for instance, that a player picked 15th in the 1st round wont do far better than the last player picked in the same round.

round1_ = pd.DataFrame
round1 = rookie_stats['Round']== 1.0
round1_ = rookie_stats[round1]
round1_.plot.scatter(x = 'Pick_Per_Round', y = 'Yards_Per_Game', figsize=(12, 10))
sns.regplot(x = 'Pick_Per_Round', y = 'Yards_Per_Game', data = round1_)

round2_ = pd.DataFrame
round2 = rookie_stats['Round']== 2.0
round2_ = rookie_stats[round2]
round2_.plot.scatter(x = 'Pick_Per_Round', y = 'Yards_Per_Game', figsize=(12, 10))
sns.regplot(x = 'Pick_Per_Round', y = 'Yards_Per_Game', data = round2_)

round3_ = pd.DataFrame
round3 = rookie_stats['Round']== 3.0
round3_ = rookie_stats[round3]
round3_.plot.scatter(x = 'Pick_Per_Round', y = 'Yards_Per_Game', figsize=(12, 10))
sns.regplot(x = 'Pick_Per_Round', y = 'Yards_Per_Game', data = round3_)

round4_ = pd.DataFrame
round4 = rookie_stats['Round']== 4.0
round4_ = rookie_stats[round4]
round4_.plot.scatter(x = 'Pick_Per_Round', y = 'Yards_Per_Game', figsize=(12, 10))
sns.regplot(x = 'Pick_Per_Round', y = 'Yards_Per_Game', data = round4_)

round5_ = pd.DataFrame
round5 = rookie_stats['Round']== 5.0
round5_ = rookie_stats[round5]
round5_.plot.scatter(x = 'Pick_Per_Round', y = 'Yards_Per_Game', figsize=(12, 10))
sns.regplot(x = 'Pick_Per_Round', y = 'Yards_Per_Game', data = round5_)

round6_ = pd.DataFrame
round6 = rookie_stats['Round']== 6.0
round6_ = rookie_stats[round6]
round6_.plot.scatter(x = 'Pick_Per_Round', y = 'Yards_Per_Game', figsize=(12, 10))
sns.regplot(x = 'Pick_Per_Round', y = 'Yards_Per_Game', data = round6_)

<matplotlib.axes._subplots.AxesSubplot at 0x7f9430a15100>

In this next graph, we plot the average games started during a rookie season per round. In this graph, we can see that the earlier-drafted players usually play more than the players who are drafted in the later rounds. This once again makes sense because players taken earlier are expected to be immediate contributors.

gs_per_round = rookie_stats.groupby("Pick").agg(("Games_Started")).mean()
gpr = pd.DataFrame(gs_per_round)
gpr = gpr.reset_index()

gpr.plot.scatter(x = 'Pick', y = 'Games_Started', figsize=(12, 10))
sns.regplot(x = 'Pick', y = 'Games_Started', data = gpr)

<matplotlib.axes._subplots.AxesSubplot at 0x7f9417544c70>

This next graph is a histogram that shows how many receivers out of the top 50 receivers (by touchdowns) were from the 1st, 2nd, 3rd, etc. round. Surprisingly, players from the second round combined for more touchdowns than the players from the first round. However, we can see that the general downward trend still holds true.

rookie_stats = rookie_stats.sort_values(by='TD', ascending=False)
top_50_td = rookie_stats.head(50)
plt.hist(top_50_td["Round"])

(array([13.,  0., 21.,  0.,  5.,  0.,  6.,  0.,  4.,  1.]),
 array([1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. ]),
 <a list of 10 Patch objects>)

Machine Learning and Hypothesis Testing¶

Our hypothesis was that there is a relationship between the round that a recevier was draft and how many yards per game they had during their rookie season. To see the relationship between yards per game and round picked in the draft, we're going to conduct an F-test

ypg_lm=ols('Yards_Per_Game~Round', data=rookie_stats).fit() #Specify C for Categorical
print(sm.stats.anova_lm(ypg_lm, typ=2))

                sum_sq    df         F    PR(>F)
Round      1214.698696   1.0  5.036503  0.028458
Residual  14711.918446  61.0       NaN       NaN

After conducting the F-test, we see that the F-test statistic is 5.036503 and the P-value is 0.028458. The P-value is less than the significance value of 0.05, so we can reject the null hypothesis. This shows that yards_per_game and round picked are linearly related, so the round a player got picked has an effect on the yards per game he has produced

Another hypothesis was that there is a relationship between a where a player was drafted and how many touchdowns they scored. To see the relationship between number of touchdowns and round picked in the draft, we're going to conduct an F-test.

td_lm=ols('TD~Round', data=rookie_stats).fit() #Specify C for Categorical
print(sm.stats.anova_lm(td_lm, typ=2))

              sum_sq    df         F    PR(>F)
Round       9.441440   1.0  1.475706  0.229129
Residual  390.272846  61.0       NaN       NaN

After conducting the F-test, we see that the F-test statistic is 1.475706 and the P-value is 0.229129. The P-value is greater than the significance value of 0.05, so we failed to reject the null hypothesis. This shows that we fail to state that the relationship between number of touchdowns and round picked is linear.

In this next part, we will be using combine results to create a regression model to predict the round in which an NFL player will be drafted. First, we create a dataframe of players who have done at least 5 of the major combine drills.

complete_drills = df_wr[df_wr['Forty'].notna()]
complete_drills = complete_drills[complete_drills['Vertical'].notna()]
complete_drills = complete_drills[complete_drills['BroadJump'].notna()]
complete_drills = complete_drills[complete_drills['Ht'].notna()]
complete_drills = complete_drills[complete_drills['Cone'].notna()]

Then we'll create a model that will attempt to relate the forty time, vertical, broad jump, height, and cone drill to the round that the player was drafted in.

ml_model=ols('Round~Forty+Vertical+BroadJump+Ht+Cone', data=complete_drills).fit()
resid_df = pd.DataFrame()
resid_df["resid"] = ml_model.resid
resid_df["fitted"] = ml_model.fittedvalues

sns.residplot(x = 'fitted', y = 'resid', data = resid_df)
plt.title("Residual error versus fitted")
plt.show()

The residual points are centered around 0. This shows us that the normality assumption has not been severely violated and that our errors are within reason.

We can now attempt to find the expected draft position based on the model, which will be plotted alongside the real draft position of each player.

complete_drills["expected_pos_draft"] = ml_model.params[0] + (ml_model.params[1] * complete_drills["Forty"])+ (ml_model.params[2] * complete_drills["Vertical"])+ (ml_model.params[3] * complete_drills["BroadJump"])+ (ml_model.params[4] * complete_drills["Ht"])+ (ml_model.params[5] * complete_drills["Cone"])
complete_drills.plot.scatter(x = 'Round', y = 'expected_pos_draft')
sns.regplot(x = 'Round', y = 'expected_pos_draft', data = complete_drills)

<matplotlib.axes._subplots.AxesSubplot at 0x7f943508c430>

Looking at this graph, we can see that our predictive model using regression is not nearly accurate enough. For every actual round a player was drafted in, there is a multitude of different projections that range across the spectrum. The trend line does not at all fit the actual data points, and we can see very clearly that there is no conclusive evidence that our model is accurate. Based on this, we can conclude that in the dataset we used and along with the data tables we scraped, there is no conclusive evidence that combine results are a significant predictor of draft round. Overall, through this project, we were able to go through the data science pipeline and apply it to the NFL Draft. We were able to analyze data in order to make conclusions about various aspects of the draft (specifically with regards to wide receivers), as well as develop our own regression model to attempt predict the rounds in which wide receivers are drafted. After analyzing the results of our model, we were able to come to the conclusion that the NFL draft is extremely hard to predict.

There have been others who have tried to analyze this in the past, we will post their work below.

https://chance.amstat.org/2016/11/draft-and-nfl-performance/

https://seanjtaylor.github.io/learning-the-draft/

Thank you for your time

	Player	Pos	Ht	Wt	Forty	Vertical	BenchReps	BroadJump	Cone	Shuttle	Year	Pfr_ID	AV	Team	Round	Pick
2609	Adrian Arrington	WR	75	203	4.55	NaN	NaN	NaN	NaN	NaN	2008	ArriAd00	1.0	New Orleans Saints	7.0	237.0
2610	Donnie Avery	WR	71	192	4.43	NaN	16.0	NaN	NaN	NaN	2008	AverDo00	9.0	St. Louis Rams	2.0	33.0
2621	Earl Bennett	WR	71	209	4.48	26.0	15.0	110.0	7.15	4.22	2008	BennEa00	12.0	Chicago Bears	3.0	70.0
2650	Keenan Burton	WR	72	201	4.44	38.5	10.0	125.0	6.77	4.20	2008	BurtKe00	3.0	St. Louis Rams	4.0	128.0
2652	Andre Caldwell	WR	72	204	4.35	33.0	NaN	124.0	6.75	4.11	2008	CaldAn00	8.0	Cincinnati Bengals	3.0	97.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
5833	Ryan Switzer	WR	68	181	4.51	32.0	11.0	116.0	6.77	4.00	2017	SwitRy00	1.0	Dallas Cowboys	4.0	133.0
5838	Trent Taylor	WR	68	181	4.63	33.0	13.0	117.0	6.74	4.01	2017	TaylTr02	3.0	San Francisco 49ers	5.0	177.0
5839	Taywan Taylor	WR	71	203	4.50	33.5	13.0	132.0	6.57	4.21	2017	TaylTa00	2.0	Tennessee Titans	3.0	72.0
5864	Dede Westbrook	WR	72	178	NaN	NaN	NaN	NaN	NaN	NaN	2017	WestDe00	3.0	Jacksonville Jaguars	4.0	110.0
5871	Mike Williams-04	WR	76	218	NaN	32.5	15.0	121.0	NaN	NaN	2017	WillMi07	1.0	Los Angeles Chargers	1.0	7.0

	Rk	Player	Tm	Age	Pos	G	GS	Tgt	Rec	Ctch%	Yds	Y/R	TD	1D	Lng	Y/Tgt	R/G	Y/G	Fmb	Season
0	1	Andre Johnson *+	HOU	27	WR	16	16	171	115	67.3%	1575	13.7	8	79	65	9.2	7.2	98.4	1	2008
1	2	Wes Welker*	NWE	27	WR	16	14	149	111	74.5%	1165	10.5	3	57	64	7.8	6.9	72.8	1	2008
2	3	Brandon Marshall*	DEN	24	WR	15	15	181	104	57.5%	1265	12.2	6	67	47	7.0	6.9	84.3	4	2008
3	4	Larry Fitzgerald*+	ARI	25	WR	16	16	154	96	62.3%	1431	14.9	12	66	78	9.3	6.0	89.4	1	2008
5	6	T.J. Houshmandzadeh	CIN	31	WR	15	15	137	92	67.2%	904	9.8	4	51	46	6.6	6.1	60.3	0	2008

	Rk	Player	Tm	Age	Pos_x	G	GS	Tgt	Rec	Ctch%	Yds	Y/R	TD	1D	Lng	Y/Tgt	R/G	Y/G	Fmb	Season	Pos_y	Ht	Wt	Forty	Vertical	BenchReps	BroadJump	Cone	Shuttle	Year	Pfr_ID	AV	Team	Round	Pick
0	7	Eddie Royal	DEN	22	WR	15	15	129	91	70.5%	980	10.8	5	43	93	7.6	6.1	65.3	2	2008	WR	70	184	4.39	36.0	24.0	124.0	7.07	4.34	2008	RoyaEd00	19.0	Denver Broncos	2.0	42.0
6	35	DeSean Jackson	PHI	22	WR	16	15	120	62	51.7%	912	14.7	2	43	60	7.6	3.9	57.0	4	2008	WR	70	169	4.35	NaN	NaN	120.0	NaN	NaN	2008	JackDe00	34.0	Philadelphia Eagles	2.0	49.0
13	56	Donnie Avery	STL	24	WR	15	12	102	53	52.0%	674	12.7	3	29	69	6.6	3.5	44.9	0	2008	WR	71	192	4.43	NaN	16.0	NaN	NaN	NaN	2008	AverDo00	9.0	St. Louis Rams	2.0	33.0
17	39	Austin Collie	IND	24	wr	16	5	89	60	67.4%	676	11.3	7	37	39	7.6	3.8	42.3	0	2009	WR	73	200	4.53	34.0	17.0	120.0	6.78	4.24	2009	CollAu00	18.0	Indianapolis Colts	4.0	127.0
24	46	Jeremy Maclin	PHI	21	WR	15	13	91	56	61.5%	773	13.8	4	31	56	8.5	3.7	51.5	0	2009	WR	72	198	4.43	NaN	NaN	NaN	NaN	NaN	2009	MaclJe00	23.0	Philadelphia Eagles	1.0	19.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
358	50	JuJu Smith-Schuster	PIT	21	wr	14	7	79	58	73.4%	917	15.8	7	37	97	11.6	4.1	65.5	0	2017	WR	73	215	4.54	32.5	15.0	120.0	NaN	NaN	2017	SmitJu00	10.0	Pittsburgh Steelers	2.0	62.0
361	118	Corey Davis	TEN	22	WR	11	9	65	34	52.3%	375	11.0	0	17	37	5.8	3.1	34.1	1	2017	WR	75	209	NaN	NaN	NaN	NaN	NaN	NaN	2017	DaviCo03	3.0	Tennessee Titans	1.0	5.0
363	150	Kenny Golladay	DET	24	wr	11	5	48	28	58.3%	477	17.0	3	18	54	9.9	2.5	43.4	0	2017	WR	76	218	4.50	35.5	18.0	120.0	7.00	4.15	2017	GollKe00	4.0	Detroit Lions	3.0	96.0
365	159	Dede Westbrook	JAX	24	wr	7	5	51	27	52.9%	339	12.6	1	15	29	6.6	3.9	48.4	1	2017	WR	72	178	NaN	NaN	NaN	NaN	NaN	NaN	2017	WestDe00	3.0	Jacksonville Jaguars	4.0	110.0
368	325	Josh Malone	CIN	21	WR	11	7	17	6	35.3%	63	10.5	1	3	25	3.7	0.5	5.7	0	2017	WR	75	208	4.40	30.5	10.0	121.0	7.05	4.19	2017	MaloJo00	1.0	Cincinnati Bengals	4.0	128.0