Well, let's get into the fun part. In general, all of the positions are relatively simple in pulling the data but vary in the columns that are pulled. We'll only use two libraries for parsing the website and then combining the data frames.
require("XML")
require("plyr")
options(stringsAsFactors = FALSE)
Next up, let's pull the raw data. As you'll note, the first column and row are not useful for us so we'll drop it and then change the column names. The last piece looks at removing some weird characters that ended up inside of the Name field.
link<-"http://fftoday.com/rankings/playerproj.php?Season=2014&PosID=10&LeagueID=26955"
x<-readHTMLTable(link)
qb <- x[11]$'NULL'
qb <- qb[-1,-1]
names(qb)<-c("Name","Team","Bye.Week","Passing.Completions","Passing.Attempts",
"Passing.Yards","Passing.TDs","Passing.INTs",
"Rushing.Attempts","Rushing.Yards","Rushing.TDs","FFPTs")
qb$Name<-sapply(qb$Name,function(x) substring(x,3))
We then repeat this for the other positions.
link<-"http://fftoday.com/rankings/playerproj.php?Season=2014&PosID=20&LeagueID=26955"
x<-readHTMLTable(link)
rb <- x[11]$'NULL'
rb <- rb[-1,-1]
names(rb)<-c("Name","Team","Bye.Week","Rushing.Attempts","Rushing.Yards","Rushing.TDs",
"Receiving.Catches","Receiving.Yards","Receiving.TDs","FFPTs")
rb$Name<-sapply(rb$Name,function(x) substring(x,3))
link<-"http://fftoday.com/rankings/playerproj.php?Season=2014&PosID=30&LeagueID=26955"
x<-readHTMLTable(link)
wr <- x[11]$'NULL'
wr <- wr[-1,-1]
names(wr)<-c("Name","Team","Bye.Week","Receiving.Catches","Receiving.Yards","Receiving.TDs",
"Rushing.Attempts","Rushing.Yards","Rushing.TDs","FFPTs")
wr$Name<-sapply(wr$Name,function(x) substring(x,3))
link<-"http://fftoday.com/rankings/playerproj.php?Season=2014&PosID=40&LeagueID=26955"
x<-readHTMLTable(link)
te <- x[11]$'NULL'
te <- te[-1,-1]
names(te)<-c("Name","Team","Bye.Week","Receiving.Catches","Receiving.Yards","Receiving.TDs","FFPTs")
te$Name<-sapply(te$Name,function(x) substring(x,3))
link<-"http://fftoday.com/rankings/playerproj.php?Season=2014&PosID=80&LeagueID=26955"
x<-readHTMLTable(link)
k <- x[11]$'NULL'
k <- k[-1,-1]
names(k)<-c("Name","Team","Bye.Week","Field.Goals.Made","Field.Goals.Attempts","Field.Goals.Pct",
"Extra.Points.Made","Extra.Points.Attempts","FFPTs")
k$Name<-sapply(k$Name,function(x) substring(x,3))
link<-"http://fftoday.com/rankings/playerproj.php?Season=2014&PosID=99&LeagueID=26955"
x<-readHTMLTable(link)
def <- x[11]$'NULL'
def <- def[-1,-1]
names(def)<-c("Name","Bye.Week","Sacks","Fumble.Recovered","Interceptions",
"Defensive.TDs","Points.Against","Passing.Yards.per.Game","Rushing.Yards.per.Game",
"Safety","Kicking.TDs","FFPTs")
def$Name<-sapply(def$Name,function(x) substring(x,3))
Lastly, we'll add a field with the position and then combine the datasets. Note since each of the data frames has different columns, we need to use a function from the PLYR package to combine the data frames. This leaves NAs where the column wasn't in the data frame. We can easily fix that with the last statement.
qb$pos <- as.factor("QB")
rb$pos <- as.factor("RB")
wr$pos <- as.factor("WR")
te$pos <- as.factor("TE")
def$pos <- as.factor("DEF")
k$pos <- as.factor("K")
d <- rbind.fill(qb,rb,wr,te,def,k)
d[is.na(d)]<-0
Hooray, better data to work with in our optimization. I'm looking into using a different solver that looks to be a bit more user friendly. Keep an eye out for a blog on localsolver up next!
No comments:
Post a Comment