简体   繁体   English

我想从R数据框中的一列生成5个名称的组合,其在不同列中的值加起来等于或小于一定数量

[英]I want to generate combinations of 5 names from a column in an R data frame, whose values in a different column add up to a certain number or less

I have a data frame (UFC) with 4 columns. 我有一个4列的数据框(UFC)。

Column 1 (UFC$Name) is names of UFC fighters fighting this weekend. 第1栏(UFC $ Name)是本周末战斗的UFC战斗机的名称。

Column 2 (UFC$Salary) is how much they "cost" in a fantasy sports contest. 第2栏(UFC $ Salary)是他们在幻想体育比赛中“花费”的金额。

Column 3 (UFC$WinPct) is how likely the fighter is to win the fight. 第3栏(UFC $ WinPct)是战斗机赢得战斗的可能性。

Column 4 (UFC$FinishPct) is how likely the fighter is to win the fight without it going to a decision. 第4栏(UFC $ FinishPct)是战斗机在不做出决定的情况下赢得战斗的可能性。

I'd like to make a data frame that contains all (or more practically the top X number of them, based on the parameter I mention in the next paragraph) the combinations of 5 fighters from column 1, whose column 2 sums add up to $50,000 or less. 我想制作一个数据框,其中包含全部(或更确切地说,它们的前X个,根据我在下一段中提到的参数)第1列中5架战斗机的组合,第2列总和为$ 50,000以下。

Then what I'm really interested in, is the combinations of 5 Fighters whose column 4 sums are highest. 然后我真正感兴趣的是5架战斗机的组合,其第4列总和最高。

I'm getting pretty good at low level tinkering with data frames but this is a little too advanced for me to wrap my head around how to approach. 我在低级修补数据帧方面表现不错,但这对我来说太高级了,不足以让我全神贯注于处理方法。

Here is about 30% of the dataframe. 这大约是数据帧的30%。

              Name Salary WinPct FinishPct
    Keita Nakamura   9100  31.00     15.36
       George Roop   8900  33.00     15.76
   Teruto Ishihara   9000  33.00     17.08
    Naoyuki Kotani   8700  30.50     18.35
     Yusuke Kasuya   8500  29.60     21.16
  Katsunori Kikuno   8800  33.66     21.88

The desired output would look something like this: 所需的输出如下所示:

Lineup                                                                       
Roy Nelson,Gegard Mousasui,Yusuke Kasuya,George Roop,Diego Brandao      
SalarySum
47900     
FinishPctSum     
148.99 

And it would return the top X number of those outputs, ranked by highest FinishPctSum 然后它将返回这些输出的前X个,按最高FinishPctSum排名

Well this won't be terribly fast but it's an idea ... 好吧,这并不会很快,但这是一个主意...

## make a list of all combinations of 5 of Name, Salary, and FinishPct
xx <- with(df, lapply(list(as.character(Name), Salary, FinishPct), combn, 5))
## convert the names to a string, 
## find the column sums of the others,
## set the names
yy <- setNames(
    lapply(xx, function(x) {
        if(typeof(x) == "character") apply(x, 2, toString) else colSums(x)
    }),
    names(df)[c(1, 2, 4)]
)
## coerce to data.frame
newdf <- as.data.frame(yy)

which results in 导致

#                                                                              Names Salary FinishPct
# 1      Keita Nakamura, George Roop, Teruto Ishihara, Naoyuki Kotani, Yusuke Kasuya  44200     87.71
# 2   Keita Nakamura, George Roop, Teruto Ishihara, Naoyuki Kotani, Katsunori Kikuno  44500     88.43
# 3    Keita Nakamura, George Roop, Teruto Ishihara, Yusuke Kasuya, Katsunori Kikuno  44300     91.24
# 4     Keita Nakamura, George Roop, Naoyuki Kotani, Yusuke Kasuya, Katsunori Kikuno  44000     92.51
# 5 Keita Nakamura, Teruto Ishihara, Naoyuki Kotani, Yusuke Kasuya, Katsunori Kikuno  44100     93.83
# 6    George Roop, Teruto Ishihara, Naoyuki Kotani, Yusuke Kasuya, Katsunori Kikuno  43900     94.23

No check has been performed to determine whether the salaries are less than 50k. 没有执行任何检查来确定工资是否少于50k。 It just gives all the combinations of 5 fighters with their respective sums. 它只给出了5名战士的所有组合及其各自的总和。 You can subset to find those salaries less than 50k with 您可以子集查找薪水少于50k的那些

newdf[newdf$Salary <= 5e4, ]

Note that 5e4 is shorthand/scientific notation for 50,000. 请注意5e4是50,000的简写/科学计数法。

Data: 数据:

df <- structure(list(Name = structure(c(3L, 1L, 5L, 4L, 6L, 2L), .Label = c("George Roop", 
"Katsunori Kikuno", "Keita Nakamura", "Naoyuki Kotani", "Teruto Ishihara", 
"Yusuke Kasuya"), class = "factor"), Salary = c(9100L, 8900L, 
9000L, 8700L, 8500L, 8800L), WinPct = c(31, 33, 33, 30.5, 29.6, 
33.66), FinishPct = c(15.36, 15.76, 17.08, 18.35, 21.16, 21.88
)), .Names = c("Name", "Salary", "WinPct", "FinishPct"), class = "data.frame", row.names = c(NA, 
-6L))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我想根据同一数据帧中其他列的条件,从R数据帧的列中生成8个名称组合 - I want to generate 8 combinations of names from a column in an R data frame based on conditions from other columns in the same data frame 如何在 R 中添加一个列,其值引用不同数据框中的列? - How can I add a column in R whose values reference a column in a different data frame? 在值列表之后,我想对 r 中的数据框进行子集化,其中行包含某列中的值 - Following a list of values, I want to subset a data frame in r with rows containing the values in a certain column 我想在某个日期之前在数据框上运行代码(第 2 列) - I want to run code on data frame up to a certain date (column 2) 从数据框中过去的值到矩阵的列名(在 R 中) - past values from a data frame to column names of a matrix (in R) 在列表中生成新列,以数据帧从带有lapply(R)的数组中分配不同的值 - Generate new column in list assigning different values by data frame from an array with lapply (R) 如何根据另一列中的值创建一列,这些值是我想用其数据填充 newcol 的 dataframe 中的变量名称? R - How do I create a column based on values in another column which are the names of variables in my dataframe whose data I want to fill newcol with? R R子集一个数据帧,不包括列中的某些值 - R subsetting a Data Frame excluding certain values from Column 从数据框中删除其列值与另一个数据框的列值不匹配的数据 - R - remove rows from data frame whose column values don't match another data frame's column values - R R - 数据框列中唯一值的数量 - R - number of unique values in a column of data frame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM