Feasible Analytics

Wednesday, March 5, 2014

Optimization using R and LPSolve

I saw a recent post on OR-Exchange about what programming language is best to for optimization. While I agree with @JFPuget that languages are just a wrapper for various solvers, there is a learning curve behind how to use each wrapper. My previous entries are about how to program in SAS using optmodel. I took this opportunity to write up the same example inside of R and using lpsolve.

First up, let's setup the data.

products<-data.frame(
product=c("TRAVEL","CASH Rewards","HOTEL"),
volume=rep(500,3))
prices<-data.frame(
price=c("10.99"),
volume=c(500))
set.seed(123)
customers<-data.frame(
id=seq(1,1000),
customer_status=rbinom(1000,1,0.25))
set.seed(123)
require("triangle")
model.scores<-rbind(
data.frame(
id=seq(1,1000),
price=rep("10.99",1000),
product=rep("TRAVEL",1000),
expected.profit=runif(1000,1,100),
likelihood.to.apply=rtriangle(1000,0.0,1.0,0.6)),
data.frame(
id=seq(1,1000),
price=rep("10.99",1000),
product=rep("CASH Rewards",1000),
expected.profit=runif(1000,1,100),
likelihood.to.apply=rtriangle(1000,0.0,1.0,0.6)),
data.frame(
id=seq(1,1000),
price=rep("10.99",1000),
product=rep("HOTEL",1000),
expected.profit=runif(1000,1,100),
likelihood.to.apply=rtriangle(1000,0.0,1.0,0.6)))

Next Up, the optimization code. I will note that this is somewhat confusing but I will break it up into sections similar to what I did with the SAS version. I will be using these two libraries for lpSolve. The first one provides access to the solver itself but the API is what actually makes this relatively easy to code.

require("lpSolve");require("lpSolveAPI")

The first step is making an empty optimization problem named ip. I start with zero constraints and add in the number of decision variables required.

ip <- make.lp(0,nrow(customers)*nrow(products)*nrow(prices))

Next I declare each decision variable as binary.

set.type(ip,seq(1,nrow(customers)*nrow(products)*nrow(prices)),type="binary")

This next part will seem like a relatively pointless step, but it will help in adding constraints. This data.frame contains every combination of customer, product, and price available. This will provide the index for assigning values in constraints and the objective function.

combos <- expand.grid(prices$price,products$product,customers$id)
names(combos)<-c('price','product','id')
rownames(combos)<-do.call(paste, c(combos[c("id", "product","price")], sep = "_"))
rownames(model.scores)<-do.call(paste, c(model.scores[c("id", "product","price")], sep = "_"))
model.scores.ordered<-model.scores[order(match(rownames(model.scores),rownames(combos))),]

By default, the objective function is set to minimize the problem. First I will set the coefficients to the decision variables in the objective function and then change it to be a maximization problem.

set.objfn(ip,model.scores.ordered$expected.profit*model.scores.ordered$likelihood.to.apply)
lp.control(ip,sense="max")

Here are two constants we will use in defining the constraints.

prod.per.person<-1
price.per.product<-1

First up, let's add in the constraints around the number of products per person.

for (c in customers$id) {
add.constraint(ip,
rep(1,nrow(products)*nrow(prices)),
"<=",
prod.per.person,
which(combos$id==c)
)
}

The next set of constraints will limit the number of price points per product per customer.

for (c in customers$id) {
for (p in products$product) {
add.constraint(ip,
rep(1,nrow(prices)),
"<=",
price.per.product,
which(combos$product==p & combos$id==c)
)
}
}

The next set of constraints assign the volume constraints for price points.

for (q in prices$price) {
add.constraint(ip,
rep(1,nrow(customers)*nrow(products)),
"<=",
prices$volume[which(q==prices$price)],
which(combos$price==q)
)
}

The next set of constraints assign the volume constraints for products.

for (p in products$product) {
add.constraint(ip,
rep(1,length(which(combos$product==p))),
"<=",
products$volume[which(p==products$product)],
which(combos$product==p)
)
}

Finally, let's clean up the formulation a little and assign names to the decision variables and constraints.

colnames <- character()
for (c in customers$id) {
for (p in products$product) {
for (q in prices$price) {
colnames<-c(colnames,paste(c,p,q,sep="_"))
}
}
}
rownames <- character()
for (c in customers$id) {
rownames<-c(rownames,paste("prod_per_cust",c,sep="_"))
}
for (c in customers$id) {
for (p in products$product) {
rownames<-c(rownames,paste("price_per_prod",c,p,sep="_"))
}
}
for (q in prices$price) {
rownames<-c(rownames,paste("price_vol",q,sep="_"))
}
for (p in products$product) {
rownames<-c(rownames,paste("prod_vol",p,sep="_"))
}

dimnames(ip) <- list(rownames, colnames)

It's possible to write out the formulation in case you would like to review it. Note that this can be very overwhelming with the increase in any of the three dimensions to the problem.

write.lp(ip,"modelformulation.txt",type="lp",use.names=T)

Last, but not least, let's solve the problem. Following the ability to solve the problem are commands to view the objective value, decision variable values and constraint values.

solve(ip)
get.objective(ip)
get.variables(ip)

get.constraints(ip)

I'm sure there are more elegant ways for writing up this code, but it still remains a relatively painful way to code an optimization problem. I'll skip the visualization part for now. I'm going to spend time writing up a Python version next and then hope to write a comparison between the three versions.

Thursday, January 23, 2014

Optimization in SAS: Results

After spending all the time writing up code, next comes the fun part and what we really get paid for in the consulting world. I'm going to walk through three pieces of the results having hit "run". The three pieces are:

The Log: This contains information behind the compilation of the code, including errors and high level results.
The Results Window (Tab if you are in SAS EG): Contains the default results of having written a basic PROC OPTMODEL.
Graphing the results: Some bonus code at the end to make some sense

First up, the log produced from the program. SAS uses the log in order to print out the model description and the high level results. I'll skip the boring parts of the log where it reprints the code and get right to the important part. This set of "notes" in SAS describes what was created as the optimization formulation. It will also give the stats around "presolving" the problem and the solution status. SAS "presolves" the optimization problem by removing extraneous information like variables and non-binding constraints.

NOTE: The problem has 3000 variables (0 free, 0 fixed).

NOTE: The problem has 3000 binary and 0 integer variables.

NOTE: The problem has 4004 linear constraints (4004 LE, 0 EQ, 0 GE, 0 range).

NOTE: The problem has 12000 linear constraint coefficients.

NOTE: The problem has 0 nonlinear constraints (0 LE, 0 EQ, 0 GE, 0 range).

NOTE: The OPTMODEL presolver removed 0 variables, 3003 linear constraints, and 0 nonlinear constraints.

NOTE: The OPTMODEL presolved problem has 3000 variables, 1001 linear constraints, and 0 nonlinear constraints.

NOTE: The OPTMODEL presolver removed 8000 linear constraint coefficients, leaving 4000.

NOTE: The OPTMILP presolver value AUTOMATIC is applied.

NOTE: The OPTMILP presolver removed 0 variables and 0 constraints.

NOTE: The OPTMILP presolver removed 0 constraint coefficients.

NOTE: The OPTMILP presolver modified 0 constraint coefficients.

NOTE: The presolved problem has 3000 variables, 1001 constraints, and 4000 constraint coefficients.

NOTE: The MIXED INTEGER LINEAR solver is called.

Node Active Sols BestInteger BestBound Gap Time

0 1 2 41528.0754238 41528.0754238 0.00% 1

0 0 2 41528.0754238 . 0.00% 1

NOTE: Optimal.

NOTE: Objective = 41528.0754.

Hooray, the model not only solved, but it found an optimal solution. Next part is to check the results of the program. This is where SAS will put all printouts from the procedures used in the program. There are three sections to this problem in the results section. The first one is the test print we had to verify that our data was read in to the model correctly.

The next part shows the model summary. You'll noticed that this looks like the output in the log, but forced into a table format. I don't particularly use this table very often other than to verify the size of the table. It can be helpful to note the size of the table in reference to the run-time for the problem.

The last piece shows the solution summary. This gives you some of the relevant information you need to evaluate your model.

The last step for me is taking a stab at visualizing the results produced by the optimization model. Since this optimization problem is relatively simple, I want to color code who was mailed and who wasn't for each of the products. Since my objective function is the product of Profit and the Likelihood To Apply, I use each as an axis inside of the chart. Below is the simple macro I use to build the chart for each product. First I use SQL to generate a dynamic header for the chart with the total expected profit for that product. I then use SGPLOT to draw the scatter plot.

proc format;

value mail

1="Mail"

0="No Mail"

;

run;

%macro graph_results(product);

goptions device=svg;

ods graphics on / antialiasmax=10000;

proc sql noprint;

select sum(e_profit*lta) format=dollar12.

into :profit

from results

where mail=1

and product = "&product"

;

quit;

proc sgplot data=results;

title1 "Optimization Results for &product.";

title2 "Total Expected Profit = &profit.";

where product = "&product";

format mail mail.;

scatter x=e_profit y=lta /

group=mail

markerattrs=(size=5 symbol=circlefilled)

transparency=0.7

name="scatter"

;

keylegend "scatter" / across=1 position=topleft noborder location=inside ;

xaxis label="Expected Profit" values=(0 to 100 by 20) display=(noticks);

yaxis label="Likelihood to Apply" values=(0 to 1 by .1) display=(noticks);

run;

title;

ods graphics off;

goptions reset=all;

%mend;

%graph_results(TRAVEL);

Note that it isn't the greatest picture, but it gives you an idea of the solution. I hope to give the same visualization a shot in Tableau to demonstrate that without coding up a chart, you can still get a nice visualization.