I'm trying to make an R script for fantasy football (proper UK football, not hand egg :-)) where I can input a list of players in a csv and it will spit out every 11-player combination, which meet various constraints.
Here's my sample dataframe:
df <- read.csv("Filename.csv",
header = TRUE)
> print(df)
Name Positon Team Salary
1 Eric Dier D TOT 9300000
2 Erik Pieters D STO 9200000
3 Christian Fuchs D LEI 9100000
4 Héctor Bellerín D ARS 9000000
5 Charlie Daniels D BOU 9000000
6 Ben Davies D TOT 8900000
7 Federico Fernández D SWA 8800000
8 Per Mertesacker D ARS 8800000
9 Alberto Moreno D LIV 8700000
10 Chris Smalling D MUN 8700000
11 Seamus Coleman D EVE 8700000
12 Jan Vertonghen D TOT 8700000
13 Romelu Lukaku F EVE 12700000
14 Harry Kane F TOT 12500000
15 Max Gradel F BOU 11900000
16 Alexis Sánchez F ARS 11300000
17 Jamie Vardy F LEI 11200000
18 Theo Walcott F ARS 10700000
19 Olivier Giroud F ARS 10700000
20 Wilfried Bony F MCI 10000000
21 Kristoffer Nordfeldt G SWA 7000000
22 Joe Hart G MCI 6800000
23 Jack Rose G WBA 6600000
24 Asmir Begovic G CHE 6600000
25 Mesut Özil M ARS 15600000
26 Riyad Mahrez M LEI 15200000
27 Ross Barkley M EVE 13300000
28 Dimitri Payet M WHM 12800000
29 Willian M CHE 12500000
30 Bertrand Traore M CHE 12500000
31 Kevin De Bruyne M MCI 12400000
And the constraints are as follows:
1) The total salary of each 11-player lineup cannot exceed 100,000,000
2) There can only be a maximum of four players from one team. Eg four player from 'CHE' (Chelsea).
3) There is a limit of how many players within each 11-player lineup can be from each position. There can be:
1 G (goalkeeper), 3 to 4 D (defender), 3 to 5 M (midfielder), 1 to 3 F (forward)
I'd like every 11 player combination that meets the above contraints to be returned. Order is not important (eg 1,2,3 is considered the same as 2,1,3 and shouldn't be duplicated) and a player can appear in more than one lineup.
I've done a fair bit of research and played around but can't seem to get anywhere with this. I'm new to R. I don't expect anyone to nail this for me, but if someone could point a newbie like myself in the right direction it would be much appreciated.
Thanks.
This can be solved as linear integer program using the library LPSolve. This kind of problems are very well solvable -- opposed to what has been written before -- as typical the number of solutions are much smaller than the domain size.
You can add for each Player a zero one variable, whether or not that player is in the team.
The package can be installed using
install.packages("lpSolve")
install.packages("lpSolveAPI")
The documentation can be found at: https://cran.r-project.org/web/packages/lpSolve/lpSolve.pdf
First constraint sum of players 11
The salary is basically a sum of all players variable multiplied by the salary column and so on....
To get a proper solutions you need to specify in
lp.solve(all.bin=TRUE
Such that all variables referring to players are either zero or one.
( I understood that you are trying to learn, that's why I refrain from giving a full solution)
EDIT As I got down-voted probably because of not giving the full solution. Kind of sad as as the original author explicitly wrote that he doesn't expect a full solution
library(lpSolve)
df <- read.csv("/tmp/football.csv",header = TRUE,sep=";")
f.obj <- rep(1,nrow(df))
f.con <-
matrix(c(f.obj <- rep(1,nrow(df)),
as.vector(df$Salary),
(df$Positon=="G") *1.0,
(df$Positon=="D") *1.0,
(df$Positon=="D") *1.0,
(df$Positon=="M") *1.0,
(df$Positon=="M") *1.0,
(df$Positon=="F") *1.0,
(df$Positon=="F") *1.0),nrow=9,byrow= TRUE)
f.dir <- c("==", "<=","==",">=","<=",">=","<=",">=","<=")
f.rhs<- c(11, #number players
100000000, #salary
1 , #Goalkeeper
3 , # def min
4 , # def max
3 , # mdef min
5, # mdef max
1, # for, min
3 # wor, max
)
solutions <- lp ("max", f.obj, f.con, f.dir, f.rhs,all.bin=TRUE)
I didn't add the Team Constraint as it wouldn't have provided any additionally insights here....
** EDIT2 ** This might come handy if you change your data set R lpsolve binary find all possible solutions
A brute-force way to tackle this, (which is also beautifully parallelizable and guarantees you all possible combinations) is to calculate all 11-player permutations and then filter out the combinations that don't conform to your limits in a stepwise manner.
To make a program like this fit into your computer's memory, give each player a unique integer ID and create vectors of IDs as team sets. When you then implement your filters your functions can refer to the player info by that ID in a single dataframe.
Say df
is your data frame with all player data.
df$id <- 1:nrow(df)
Get all combinations of ids:
# This will take a long time or run out of memory!
# In my 2.8Gz laptop took 466 seconds just for your 31 players
teams <- combn(df$id, 11)
Of course, if your dataframe is big (like hundreds of players) this implementation could take impossibly long to finish. You probably would be better off just sampling 11-sets from your player set without replacement and construct teams in an "on demand" fashion.
A more clever way is to partition your dataset according to player position into - one for goalkeepers, one for defence, etc. And then use the above approach to create permutations of different players from each position and combine the end results. It would take ridiculously less amount of time, it would still be parallelizable and exhaustive (give you all possible combinations).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.