Feasible Analytics: July 2014

Wednesday, July 30, 2014

LocalSolver License

I spent the last few days translating the LPSolve code from my previous optimization problem to using the LocalSolver API. I'll admit, I like the syntax and the accompanying R package to the solver quite a bit. I found it incredibly easy to import data, setup constraints and solve the problem. However, there was one big downside, I'm not a student or teacher and can't afford to buy a commercial license.

Anyway, here's the code in case you have a license that exceeds the limitations for the free license.

Monday, July 28, 2014

Better Fantasy Football Data for Optimization

It's been awhile, but finally got around to parsing a better data source. FFToday provides projections for multiple statistics per position. This fills the gap for Special Teams and Kickers that ESPN was not providing. In addition, you can filter to get the projected total points according to several different websites default scoring and rules. The link I used calculates total points for standard ESPN leagues, but you can easily adjust and pull for your own league or site.

Well, let's get into the fun part. In general, all of the positions are relatively simple in pulling the data but vary in the columns that are pulled. We'll only use two libraries for parsing the website and then combining the data frames.

require("XML")
require("plyr")
options(stringsAsFactors = FALSE)

Next up, let's pull the raw data. As you'll note, the first column and row are not useful for us so we'll drop it and then change the column names. The last piece looks at removing some weird characters that ended up inside of the Name field.

link<-"http://fftoday.com/rankings/playerproj.php?Season=2014&PosID=10&LeagueID=26955"

x<-readHTMLTable(link)
qb <- x[11]$'NULL'
qb <- qb[-1,-1]
names(qb)<-c("Name","Team","Bye.Week","Passing.Completions","Passing.Attempts",
"Passing.Yards","Passing.TDs","Passing.INTs",
"Rushing.Attempts","Rushing.Yards","Rushing.TDs","FFPTs")

qb$Name<-sapply(qb$Name,function(x) substring(x,3))

We then repeat this for the other positions.

link<-"http://fftoday.com/rankings/playerproj.php?Season=2014&PosID=20&LeagueID=26955"
x<-readHTMLTable(link)
rb <- x[11]$'NULL'
rb <- rb[-1,-1]
names(rb)<-c("Name","Team","Bye.Week","Rushing.Attempts","Rushing.Yards","Rushing.TDs",
"Receiving.Catches","Receiving.Yards","Receiving.TDs","FFPTs")
rb$Name<-sapply(rb$Name,function(x) substring(x,3))

link<-"http://fftoday.com/rankings/playerproj.php?Season=2014&PosID=30&LeagueID=26955"
x<-readHTMLTable(link)
wr <- x[11]$'NULL'
wr <- wr[-1,-1]
names(wr)<-c("Name","Team","Bye.Week","Receiving.Catches","Receiving.Yards","Receiving.TDs",
"Rushing.Attempts","Rushing.Yards","Rushing.TDs","FFPTs")
wr$Name<-sapply(wr$Name,function(x) substring(x,3))

link<-"http://fftoday.com/rankings/playerproj.php?Season=2014&PosID=40&LeagueID=26955"
x<-readHTMLTable(link)
te <- x[11]$'NULL'
te <- te[-1,-1]
names(te)<-c("Name","Team","Bye.Week","Receiving.Catches","Receiving.Yards","Receiving.TDs","FFPTs")
te$Name<-sapply(te$Name,function(x) substring(x,3))

link<-"http://fftoday.com/rankings/playerproj.php?Season=2014&PosID=80&LeagueID=26955"
x<-readHTMLTable(link)
k <- x[11]$'NULL'
k <- k[-1,-1]
names(k)<-c("Name","Team","Bye.Week","Field.Goals.Made","Field.Goals.Attempts","Field.Goals.Pct",
"Extra.Points.Made","Extra.Points.Attempts","FFPTs")
k$Name<-sapply(k$Name,function(x) substring(x,3))

link<-"http://fftoday.com/rankings/playerproj.php?Season=2014&PosID=99&LeagueID=26955"
x<-readHTMLTable(link)
def <- x[11]$'NULL'
def <- def[-1,-1]
names(def)<-c("Name","Bye.Week","Sacks","Fumble.Recovered","Interceptions",
"Defensive.TDs","Points.Against","Passing.Yards.per.Game","Rushing.Yards.per.Game",
"Safety","Kicking.TDs","FFPTs")
def$Name<-sapply(def$Name,function(x) substring(x,3))

Lastly, we'll add a field with the position and then combine the datasets. Note since each of the data frames has different columns, we need to use a function from the PLYR package to combine the data frames. This leaves NAs where the column wasn't in the data frame. We can easily fix that with the last statement.

qb$pos <- as.factor("QB")
rb$pos <- as.factor("RB")
wr$pos <- as.factor("WR")
te$pos <- as.factor("TE")
def$pos <- as.factor("DEF")
k$pos <- as.factor("K")

d <- rbind.fill(qb,rb,wr,te,def,k)
d[is.na(d)]<-0

Hooray, better data to work with in our optimization. I'm looking into using a different solver that looks to be a bit more user friendly. Keep an eye out for a blog on localsolver up next!

Tuesday, July 1, 2014

Fantasy Football Optimization Part 1

I decided to break up the problem into baby steps. This first part will deal with building out the initial structure of the optimization problem. For those that read my other post on optimization in R, I'll be using the same libraries and style for setting up this problem.

First up, let's read in the data we created in the last post. We'll add a simple column that creates a numeric ID per player.
d <- read.csv(file=paste(getwd(),"/Data/ESPN-Projections.csv", sep=""))
d$id <- as.integer(factor(paste(d$name,d$team)))

Now that the data is all set, we can load the required solver libraries.
require("lpSolve");require("lpSolveAPI");

We can set the number of teams in the league. Given the number of teams in the league, we can set up a vector of team IDs.
num.teams <- 10
teams <- seq(1,num.teams)

Similarly, we can grab the number of players in our dataset and create a vector of the ids.
num.players <- length(unique(d$id))
players <- unique(d$id)

I'm going to create a data frame with the decision variables for our problem. First up is creating the cross product of all players and teams. We'll then merge in our player data and add in a team ID.
vars <- data.frame(player.id=rep(players,num.teams))
vars <- merge(x=vars,y=d,by.y="id",by.x="player.id")
vars <- vars[,c("player.id","pos","name")]
vars$team.id <- rep(seq(1,num.teams),num.players)

The data is set up and it's time to create the actual Integer Programming problem. Note that these decision variables are also binary, either a player is assigned to that team or he isn't.
ip <- make.lp(0,num.players*num.teams)
set.type(ip,seq(1,num.players*num.teams),type="binary")

The objective function is simply to maximize the number of projected points.
set.objfn(ip,rep(d$total.points,num.teams))
lp.control(ip,sense="max")

We need to add constraints for each player that ensures that if they are assigned to a team, that they are assigned to one and only one team.
for (p in players) {
add.constraint(ip,
rep(1,num.teams),
"<=",
1,
which(vars$player.id==p)
)
}

Now for the team constraints. First up, the positions required for each team. For simplicity, I'm using the lineup that ESPN uses in their standard league. Here are the minimum number of positions to be drafted:

1 QB
2 RB
2 WR
1 RB/WR/TE (Flex player)
1 TE
1 DEF
1 K

for (t in teams) {
#This constraint covers having at least 1 QB
add.constraint(ip,
rep(1,sum(vars$pos=="QB")/num.teams),
">=",
1,
which(vars$team.id==t & vars$pos=="QB")
)
#This constraint covers having at least 2 WR
add.constraint(ip,
rep(1,sum(vars$pos=="WR")/num.teams),
">=",
2,
which(vars$team.id==t & vars$pos=="WR")
)
#This constraint covers having at least 2 RB
add.constraint(ip,
rep(1,sum(vars$pos=="RB")/num.teams),
">=",
2,
which(vars$team.id==t & vars$pos=="RB")
)
#This constraint covers having at least 1 DEF
add.constraint(ip,
rep(1,sum(vars$pos=="DEF")/num.teams),
">=",
1,
which(vars$team.id==t & vars$pos=="DEF")
)
#This constraint covers having at least 1 K
add.constraint(ip,
rep(1,sum(vars$pos=="K")/num.teams),
">=",
1,
which(vars$team.id==t & vars$pos=="K")
)
#This constraint covers having at least 1 TE
add.constraint(ip,
rep(1,sum(vars$pos=="TE")/num.teams),
">=",
1,
which(vars$team.id==t & vars$pos=="TE")
)
#This constraint covers having at least 1 flex player. Note that the other constraints require at least 1 TE, 2 RB, 2 WR. In order to cover a flex player, the total sum of players from those positions needs to be at least 6.
add.constraint(ip,
rep(1,sum(vars$pos=="TE",vars$pos=="RB",vars$pos=="WR")/num.teams),
">=",
6,
which(vars$team.id==t & (vars$pos=="TE" | vars$pos=="RB" | vars$pos=="WR"))
)
#This constraint covers each team having 16 players
add.constraint(ip,
rep(1,num.players),
"=",
16,
which(vars$team.id==t)
)
}

Well that's it for our basic set of constraints. If you're interested in seeing what the model formulation looks like, execute the "write.lp" statement below.
write.lp(ip,paste(getwd(),"/modelformulation.txt",sep=""),type="lp",use.names=T)

Now the fun part, solving the integer program. Following that it is feasible (and it is) we get the objective function value and the solution.
solve(ip)
get.objective(ip)
get.variables(ip)

Although seeing the solution looks relatively complex, we can simply keep the assignments and print them out.

sol<-vars[get.variables(ip)==1,c("name","team.id","pos")]
View(sol[order(sol$team.id,sol$pos),])

One huge downside to this approach is the lack of actual drafting strategy or complications. This problem simply looks at dividing talent evenly across teams. I particularly dislike the results of some teams ending up with more than one kicker. No one should ever own more than one kicker.

My next step is to either improve the formulation of this problem, probably by using some options mentioned in this Fantasy Football Analytics post, or to look at applying a different algorithm to solving this problem.