简体   繁体   中英

R - Combinations of a dataframe with constraints

I'm trying to make an R script for fantasy football (proper UK football, not hand egg :-)) where I can input a list of players in a csv and it will spit out every 11-player combination, which meet various constraints.

Here's my sample dataframe:

df <- read.csv("Filename.csv",
               header = TRUE)
    > print(df)
                       Name Positon Team   Salary
    1             Eric Dier       D  TOT  9300000
    2          Erik Pieters       D  STO  9200000
    3       Christian Fuchs       D  LEI  9100000
    4       Héctor Bellerín       D  ARS  9000000
    5       Charlie Daniels       D  BOU  9000000
    6            Ben Davies       D  TOT  8900000
    7    Federico Fernández       D  SWA  8800000
    8       Per Mertesacker       D  ARS  8800000
    9        Alberto Moreno       D  LIV  8700000
    10       Chris Smalling       D  MUN  8700000
    11       Seamus Coleman       D  EVE  8700000
    12       Jan Vertonghen       D  TOT  8700000
    13        Romelu Lukaku       F  EVE 12700000
    14           Harry Kane       F  TOT 12500000
    15           Max Gradel       F  BOU 11900000
    16       Alexis Sánchez       F  ARS 11300000
    17          Jamie Vardy       F  LEI 11200000
    18         Theo Walcott       F  ARS 10700000
    19       Olivier Giroud       F  ARS 10700000
    20        Wilfried Bony       F  MCI 10000000
    21 Kristoffer Nordfeldt       G  SWA  7000000
    22             Joe Hart       G  MCI  6800000
    23            Jack Rose       G  WBA  6600000
    24        Asmir Begovic       G  CHE  6600000
    25           Mesut Özil       M  ARS 15600000
    26         Riyad Mahrez       M  LEI 15200000
    27         Ross Barkley       M  EVE 13300000
    28        Dimitri Payet       M  WHM 12800000
    29              Willian       M  CHE 12500000
    30      Bertrand Traore       M  CHE 12500000
    31      Kevin De Bruyne       M  MCI 12400000

And the constraints are as follows:

1) The total salary of each 11-player lineup cannot exceed 100,000,000

2) There can only be a maximum of four players from one team. Eg four player from 'CHE' (Chelsea).

3) There is a limit of how many players within each 11-player lineup can be from each position. There can be:

1 G (goalkeeper), 3 to 4 D (defender), 3 to 5 M (midfielder), 1 to 3 F (forward)

I'd like every 11 player combination that meets the above contraints to be returned. Order is not important (eg 1,2,3 is considered the same as 2,1,3 and shouldn't be duplicated) and a player can appear in more than one lineup.

I've done a fair bit of research and played around but can't seem to get anywhere with this. I'm new to R. I don't expect anyone to nail this for me, but if someone could point a newbie like myself in the right direction it would be much appreciated.

Thanks.

This can be solved as linear integer program using the library LPSolve. This kind of problems are very well solvable -- opposed to what has been written before -- as typical the number of solutions are much smaller than the domain size.

You can add for each Player a zero one variable, whether or not that player is in the team.

The package can be installed using

 install.packages("lpSolve")
 install.packages("lpSolveAPI")

The documentation can be found at: https://cran.r-project.org/web/packages/lpSolve/lpSolve.pdf

First constraint sum of players 11

The salary is basically a sum of all players variable multiplied by the salary column and so on....

To get a proper solutions you need to specify in

lp.solve(all.bin=TRUE

Such that all variables referring to players are either zero or one.

( I understood that you are trying to learn, that's why I refrain from giving a full solution)

EDIT As I got down-voted probably because of not giving the full solution. Kind of sad as as the original author explicitly wrote that he doesn't expect a full solution

library(lpSolve)

df <- read.csv("/tmp/football.csv",header = TRUE,sep=";")

f.obj <- rep(1,nrow(df))

f.con <- 
  matrix(c(f.obj <- rep(1,nrow(df)),
    as.vector(df$Salary),
    (df$Positon=="G") *1.0,
    (df$Positon=="D") *1.0,
    (df$Positon=="D") *1.0,
    (df$Positon=="M") *1.0,
    (df$Positon=="M") *1.0,
    (df$Positon=="F") *1.0,
    (df$Positon=="F") *1.0),nrow=9,byrow= TRUE)

f.dir <- c("==", "<=","==",">=","<=",">=","<=",">=","<=")

f.rhs<- c(11, #number players
       100000000, #salary
       1 , #Goalkeeper
       3 , # def min
       4 , # def max
       3 , # mdef min
       5,  # mdef max
       1,  # for, min
       3  # wor, max
       )



solutions <-   lp ("max", f.obj, f.con, f.dir, f.rhs,all.bin=TRUE)

I didn't add the Team Constraint as it wouldn't have provided any additionally insights here....

** EDIT2 ** This might come handy if you change your data set R lpsolve binary find all possible solutions

A brute-force way to tackle this, (which is also beautifully parallelizable and guarantees you all possible combinations) is to calculate all 11-player permutations and then filter out the combinations that don't conform to your limits in a stepwise manner.

To make a program like this fit into your computer's memory, give each player a unique integer ID and create vectors of IDs as team sets. When you then implement your filters your functions can refer to the player info by that ID in a single dataframe.

Say df is your data frame with all player data.

df$id <- 1:nrow(df)

Get all combinations of ids:

# This will take a long time or run out of memory!
# In my 2.8Gz laptop took 466 seconds just for your 31 players
teams <- combn(df$id, 11) 

Of course, if your dataframe is big (like hundreds of players) this implementation could take impossibly long to finish. You probably would be better off just sampling 11-sets from your player set without replacement and construct teams in an "on demand" fashion.

A more clever way is to partition your dataset according to player position into - one for goalkeepers, one for defence, etc. And then use the above approach to create permutations of different players from each position and combine the end results. It would take ridiculously less amount of time, it would still be parallelizable and exhaustive (give you all possible combinations).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM