Showing posts with label lpsolve. Show all posts
Showing posts with label lpsolve. Show all posts

Tuesday, October 20, 2015

DFS and Optimization: Simple Optimization

Now that we have data, let's actually get into the point of having this data: choosing what players to play each day. For our first go, we're going to take a simple approach. We'll start out with just pitchers and maximizing over just one metric, game score. This will allow us to get a refresher in how to run an optimization problem in R and create a set of code that we will be able build out in the future.

First step is merging the datasets that have the two data points we need. Using this dataset, we'll create an empty Integer Program. Our decision variables will be binary variables representing if we will choose that player or not. The IP will have two constraints:

  1. We must select 2 pitchers
  2. The total salary used must be below a threshold
Using lpSolve, we can construct the problem and solve it. The results are interesting as it's easy to see what players were optimal selection. That being said, this is just a start.


Tuesday, September 15, 2015

Fantasy Football Optimization and PuLP

I've written previously about how to solve a basic fantasy football optimization using lpSolve in R. It's not terribly easy to read and just as hard to debug. That's why I was thrilled to find a Python package called PuLP. I found this package to be very simple and much easier to use than any open source solvers in R. Using the same logic described in my previous post, I implemented it using this package. Here's the code for reference:


Tuesday, July 1, 2014

Fantasy Football Optimization Part 1

I decided to break up the problem into baby steps. This first part will deal with building out the initial structure of the optimization problem. For those that read my other post on optimization in R, I'll be using the same libraries and style for setting up this problem.

First up, let's read in the data we created in the last post. We'll add a simple column that creates a numeric ID per player.
d <- read.csv(file=paste(getwd(),"/Data/ESPN-Projections.csv", sep=""))
d$id <- as.integer(factor(paste(d$name,d$team)))

Now that the data is all set, we can load the required solver libraries.
require("lpSolve");require("lpSolveAPI");

We can set the number of teams in the league. Given the number of teams in the league, we can set up a vector of team IDs.
num.teams <- 10
teams <- seq(1,num.teams)

Similarly, we can grab the number of players in our dataset and create a vector of the ids.
num.players <- length(unique(d$id))
players <- unique(d$id)

I'm going to create a data frame with the decision variables for our problem. First up is creating the cross product of all players and teams. We'll then merge in our player data and add in a team ID.
vars <- data.frame(player.id=rep(players,num.teams))
vars <- merge(x=vars,y=d,by.y="id",by.x="player.id")
vars <- vars[,c("player.id","pos","name")]
vars$team.id <- rep(seq(1,num.teams),num.players)

The data is set up and it's time to create the actual Integer Programming problem. Note that these decision variables are also binary, either a player is assigned to that team or he isn't.
ip <- make.lp(0,num.players*num.teams)
set.type(ip,seq(1,num.players*num.teams),type="binary")

The objective function is simply to maximize the number of projected points.
set.objfn(ip,rep(d$total.points,num.teams))
lp.control(ip,sense="max")

We need to add constraints for each player that ensures that if they are assigned to a team, that they are assigned to one and only one team.
for (p in players) {
  add.constraint(ip,
                 rep(1,num.teams),
                 "<=",
                 1,
                 which(vars$player.id==p)
                 )
}

Now for the team constraints. First up, the positions required for each team. For simplicity, I'm using the lineup that ESPN uses in their standard league. Here are the minimum number of positions to be drafted:

  • 1 QB
  • 2 RB
  • 2 WR
  • 1 RB/WR/TE (Flex player)
  • 1 TE
  • 1 DEF
  • 1 K


for (t in teams) {
  #This constraint covers having at least 1 QB  
  add.constraint(ip,
                 rep(1,sum(vars$pos=="QB")/num.teams),
                 ">=",
                 1,
                 which(vars$team.id==t & vars$pos=="QB")
  )
  #This constraint covers having at least 2 WR
  add.constraint(ip,
                 rep(1,sum(vars$pos=="WR")/num.teams),
                 ">=",
                 2,
                 which(vars$team.id==t & vars$pos=="WR")
  )
  #This constraint covers having at least 2 RB
  add.constraint(ip,
                 rep(1,sum(vars$pos=="RB")/num.teams),
                 ">=",
                 2,
                 which(vars$team.id==t & vars$pos=="RB")
  )
  #This constraint covers having at least 1 DEF
  add.constraint(ip,
                 rep(1,sum(vars$pos=="DEF")/num.teams),
                 ">=",
                 1,
                 which(vars$team.id==t & vars$pos=="DEF")
  )
  #This constraint covers having at least 1 K
  add.constraint(ip,
                 rep(1,sum(vars$pos=="K")/num.teams),
                 ">=",
                 1,
                 which(vars$team.id==t & vars$pos=="K")
  )
  #This constraint covers having at least 1 TE
  add.constraint(ip,
                 rep(1,sum(vars$pos=="TE")/num.teams),
                 ">=",
                 1,
                 which(vars$team.id==t & vars$pos=="TE")
  )
  #This constraint covers having at least 1 flex player. Note that the other constraints require at least 1 TE, 2 RB, 2 WR. In order to cover a flex player, the total sum of players from those positions needs to be at least 6.
  add.constraint(ip,
                 rep(1,sum(vars$pos=="TE",vars$pos=="RB",vars$pos=="WR")/num.teams),
                 ">=",
                 6,
                 which(vars$team.id==t & (vars$pos=="TE" | vars$pos=="RB" | vars$pos=="WR"))
  )
  #This constraint covers each team having 16 players
  add.constraint(ip,
                 rep(1,num.players),
                 "=",
                 16,
                 which(vars$team.id==t)
  )
}

Well that's it for our basic set of constraints. If you're interested in seeing what the model formulation looks like, execute the "write.lp" statement below.
write.lp(ip,paste(getwd(),"/modelformulation.txt",sep=""),type="lp",use.names=T)

Now the fun part, solving the integer program. Following that it is feasible (and it is) we get the objective function value and the solution.
solve(ip)
get.objective(ip)
get.variables(ip)

Although seeing the solution looks relatively complex, we can simply keep the assignments and print them out.

sol<-vars[get.variables(ip)==1,c("name","team.id","pos")]
View(sol[order(sol$team.id,sol$pos),])

One huge downside to this approach is the lack of actual drafting strategy or complications. This problem simply looks at dividing talent evenly across teams. I particularly dislike the results of some teams ending up with more than one kicker. No one should ever own more than one kicker.

My next step is to either improve the formulation of this problem, probably by using some options mentioned in this Fantasy Football Analytics post, or to look at applying a different algorithm to solving this problem.

Wednesday, March 5, 2014

Optimization using R and LPSolve

I saw a recent post on OR-Exchange about what programming language is best to for optimization. While I agree with @JFPuget that languages are just a wrapper for various solvers, there is a learning curve behind how to use each wrapper. My previous entries are about how to program in SAS using optmodel. I took this opportunity to write up the same example inside of R and using lpsolve.

First up, let's setup the data.

products<-data.frame(
   product=c("TRAVEL","CASH Rewards","HOTEL"),
   volume=rep(500,3))
prices<-data.frame(
   price=c("10.99"),
   volume=c(500))
set.seed(123)
customers<-data.frame(
   id=seq(1,1000),
   customer_status=rbinom(1000,1,0.25))
set.seed(123)
require("triangle")
model.scores<-rbind(
   data.frame(
      id=seq(1,1000),
      price=rep("10.99",1000),
      product=rep("TRAVEL",1000),
      expected.profit=runif(1000,1,100),
      likelihood.to.apply=rtriangle(1000,0.0,1.0,0.6)),
   data.frame(
      id=seq(1,1000),
      price=rep("10.99",1000),
      product=rep("CASH Rewards",1000),
      expected.profit=runif(1000,1,100),
      likelihood.to.apply=rtriangle(1000,0.0,1.0,0.6)),
   data.frame(
      id=seq(1,1000),
      price=rep("10.99",1000),
      product=rep("HOTEL",1000),
      expected.profit=runif(1000,1,100),
      likelihood.to.apply=rtriangle(1000,0.0,1.0,0.6)))

Next Up, the optimization code. I will note that this is somewhat confusing but I will break it up into sections similar to what I did with the SAS version. I will be using these two libraries for lpSolve. The first one provides access to the solver itself but the API is what actually makes this relatively easy to code.

require("lpSolve");require("lpSolveAPI")

The first step is making an empty optimization problem named ip. I start with zero constraints and add in the number of decision variables required.

ip <- make.lp(0,nrow(customers)*nrow(products)*nrow(prices))

Next I declare each decision variable as binary.

set.type(ip,seq(1,nrow(customers)*nrow(products)*nrow(prices)),type="binary")

This next part will seem like a relatively pointless step, but it will help in adding constraints. This data.frame contains every combination of customer, product, and price available. This will provide the index for assigning values in constraints and the objective function.

combos <- expand.grid(prices$price,products$product,customers$id)
names(combos)<-c('price','product','id')
rownames(combos)<-do.call(paste, c(combos[c("id", "product","price")], sep = "_"))
rownames(model.scores)<-do.call(paste, c(model.scores[c("id", "product","price")], sep = "_"))
model.scores.ordered<-model.scores[order(match(rownames(model.scores),rownames(combos))),]

By default, the objective function is set to minimize the problem. First I will set the coefficients to the decision variables in the objective function and then change it to be a maximization problem.

set.objfn(ip,model.scores.ordered$expected.profit*model.scores.ordered$likelihood.to.apply)
lp.control(ip,sense="max")

Here are two constants we will use in defining the constraints.

prod.per.person<-1
price.per.product<-1

First up, let's add in the constraints around the number of products per person.

for (c in customers$id) {
   add.constraint(ip,
                  rep(1,nrow(products)*nrow(prices)),
                  "<=",
                  prod.per.person,
                  which(combos$id==c)
   )
}

The next set of constraints will limit the number of price points per product per customer.

for (c in customers$id) {
   for (p in products$product) {
      add.constraint(ip,
                     rep(1,nrow(prices)),
                     "<=",
                     price.per.product,
                     which(combos$product==p & combos$id==c)
      )
   }
}

The next set of constraints assign the volume constraints for price points.

for (q in prices$price) {
   add.constraint(ip,
                  rep(1,nrow(customers)*nrow(products)),
                  "<=",
                  prices$volume[which(q==prices$price)],
                  which(combos$price==q)
   )
}

The next set of constraints assign the volume constraints for products.

for (p in products$product) {
   add.constraint(ip,
                  rep(1,length(which(combos$product==p))),
                  "<=",
                  products$volume[which(p==products$product)],
                  which(combos$product==p)
   )
}

Finally, let's clean up the formulation a little and assign names to the decision variables and constraints.

colnames <- character()
for (c in customers$id) {
   for (p in products$product) {
      for (q in prices$price) {
         colnames<-c(colnames,paste(c,p,q,sep="_"))
      }
   }
}
rownames <- character()
for (c in customers$id) {
   rownames<-c(rownames,paste("prod_per_cust",c,sep="_"))
}
for (c in customers$id) {
   for (p in products$product) {
      rownames<-c(rownames,paste("price_per_prod",c,p,sep="_"))
   }
}
for (q in prices$price) {
   rownames<-c(rownames,paste("price_vol",q,sep="_"))
}
for (p in products$product) {
   rownames<-c(rownames,paste("prod_vol",p,sep="_"))
}

dimnames(ip) <- list(rownames, colnames)

It's possible to write out the formulation in case you would like to review it. Note that this can be very overwhelming with the increase in any of the three dimensions to the problem.

write.lp(ip,"modelformulation.txt",type="lp",use.names=T)

Last, but not least, let's solve the problem. Following the ability to solve the problem are commands to view the objective value, decision variable values and constraint values.

solve(ip)
get.objective(ip)
get.variables(ip)

get.constraints(ip)

I'm sure there are more elegant ways for writing up this code, but it still remains a relatively painful way to code an optimization problem. I'll skip the visualization part for now. I'm going to spend time writing up a Python version next and then hope to write a comparison between the three versions.