Tuesday, June 24, 2014

Downloading ESPN Fantasy Football Projections


I'm starting another series of posts that will look at determining the best fantasy football draft using optimization and game theory. However, first we will need some data to play with. Using FantasyFootballAnalytics code, I adjusted it to download stats off of the ESPN rankings page.

First up, let's load some useful packages.

require("XML")
require("stringr")
require("ggplot2")
require("plyr")


Next, I want to make sure we read in all strings as strings and not create factors from them.
options(stringsAsFactors = FALSE)

Here's the hard part. We need to go out to the website, download the data and clean it up.
clean <- function(pos) {
    link<-"http://games.espn.go.com/ffl/tools/projections?&seasonTotals=true&seasonId=2014&slotCategoryId="
    x <- readHTMLTable(paste(link,pos,sep=""))$playertable_0
    #update the column names
    names(x) <- c("position.rank","name.team.pos", "passing.completions.per.attempt", "passing.yards", "passing.tds", "passing.interceptions", "rushing.rushes", "rushing.yards", "rushing.tds", "receiving.catches", "receiving.yards", "receiving.tds", "total.points")

    #remove header row
    x <- x[-1,]

    #separate out the completions and attempts
    x$passing.completions <- str_sub(x$passing.completions.per.attempt, end=str_locate(string=x$passing.completions.per.attempt, '/')[,1]-1)
    x$passing.attempts <- str_sub(x$passing.completions.per.attempt, start=str_locate(string=x$passing.completions.per.attempt, '/')[,1]+1)
    x[,"passing.completions.per.attempt"] <- NULL

  if (pos!="16") {
    x$name <- str_sub(x$name.team.pos, end=str_locate(string=x$name.team.pos, ',')[,1]-1)
    x$team <- str_sub(x$name.team.pos, 
                      start=str_locate(string=x$name.team.pos, ',')[,1]+2, 
                      end = str_locate(string=x$name.team.pos, ',')[,1]+4)
    x$team <- str_trim(x$team, side="right")
    x$team <- toupper(x$team) 
  }  else {
    x$name <- str_sub(x$name.team.pos, end=str_locate(string=x$name.team.pos, ' ')[,1]-1)
    team.lookup<-as.data.frame(matrix(c("DEN","Broncos","GB","Packers","NO","Saints","CAR","Panthers",
                                        "WSH","Redskins","DET","Lions","IND","Colts","PHI","Eagles","OAK","Raiders",
                                        "SEA","Seahawks","SF","49ers","DAL","Cowboys","ATL","Falcons","NE","Patriots",
                                        "SD","Chargers","MIN","Vikings","CHI","Bears","KC","Chiefs","CIN","Bengals",
                                        "PIT","Steelers","NYG","Giants","ARI","Cardinals","MIA","Dolphins",
                                        "BAL","Ravens","TB","Bucaneers","CLE","Browns","HOU","Texans","STL","Rams",
                                        "BUF","Bills","NYJ","Jets","TEN","Titans","JAC","Jaguars"),ncol=2,byrow=TRUE))
    names(team.lookup)<-c("team","name")
    x<-merge(x=x,y=team.lookup,by="name")
  } 
    x[,"name.team.pos"] <- NULL

    #change from character to numeric
    for (c in names(x)) {
        if (!(c %in% c("pos","name","team"))) x[,c] <- as.numeric(x[,c])
        #replace NAs with 0
        x[is.na(x[,c]),c] <- 0
    }
    return(x)
}


Now that the function is prepped for us, we just need to fill in the appropriate tail to each website. Note that since we'll need more than 40 RBs and 40 WRs, we'll grab the top 120 players at each position.

qb <- clean("0")
rb <- rbind(clean("2"),clean("2&startIndex=40"),clean("2&startIndex=80"))
wr <- rbind(clean("4"),clean("4&startIndex=40"),clean("4&startIndex=80"))
te <- clean("6")
def <- clean("16")
k <- clean("17")


Now let's add in a variable for each position.

qb$pos <- as.factor("QB")
rb$pos <- as.factor("RB")
wr$pos <- as.factor("WR")
te$pos <- as.factor("TE")
def$pos <- as.factor("DEF")
k$pos <- as.factor("K")


Finally, let's combine all of the position specific datasets.

d <- rbind(qb,rb,wr,te,def,k)

We can now rank each player by the point totals.

d$rank <- rank(-d$total.points, ties.method="min")

Finally, we can order the dataset by that rank.

d <- d[order(d$rank),]

Well this is a great start. Note that we have total points for Defense and Kicker, but no accompanying statistics. I'm going to explore the ESPN website to see what I can find but this will do in the meantime.

No comments:

Post a Comment