[英]I want to generate combinations of 5 names from a column in an R data frame, whose values in a different column add up to a certain number or less
I have a data frame (UFC) with 4 columns. 我有一个4列的数据框(UFC)。
Column 1 (UFC$Name) is names of UFC fighters fighting this weekend. 第1栏(UFC $ Name)是本周末战斗的UFC战斗机的名称。
Column 2 (UFC$Salary) is how much they "cost" in a fantasy sports contest. 第2栏(UFC $ Salary)是他们在幻想体育比赛中“花费”的金额。
Column 3 (UFC$WinPct) is how likely the fighter is to win the fight. 第3栏(UFC $ WinPct)是战斗机赢得战斗的可能性。
Column 4 (UFC$FinishPct) is how likely the fighter is to win the fight without it going to a decision. 第4栏(UFC $ FinishPct)是战斗机在不做出决定的情况下赢得战斗的可能性。
I'd like to make a data frame that contains all (or more practically the top X number of them, based on the parameter I mention in the next paragraph) the combinations of 5 fighters from column 1, whose column 2 sums add up to $50,000 or less. 我想制作一个数据框,其中包含全部(或更确切地说,它们的前X个,根据我在下一段中提到的参数)第1列中5架战斗机的组合,第2列总和为$ 50,000以下。
Then what I'm really interested in, is the combinations of 5 Fighters whose column 4 sums are highest. 然后我真正感兴趣的是5架战斗机的组合,其第4列总和最高。
I'm getting pretty good at low level tinkering with data frames but this is a little too advanced for me to wrap my head around how to approach. 我在低级修补数据帧方面表现不错,但这对我来说太高级了,不足以让我全神贯注于处理方法。
Here is about 30% of the dataframe. 这大约是数据帧的30%。
Name Salary WinPct FinishPct
Keita Nakamura 9100 31.00 15.36
George Roop 8900 33.00 15.76
Teruto Ishihara 9000 33.00 17.08
Naoyuki Kotani 8700 30.50 18.35
Yusuke Kasuya 8500 29.60 21.16
Katsunori Kikuno 8800 33.66 21.88
The desired output would look something like this: 所需的输出如下所示:
Lineup
Roy Nelson,Gegard Mousasui,Yusuke Kasuya,George Roop,Diego Brandao
SalarySum
47900
FinishPctSum
148.99
And it would return the top X number of those outputs, ranked by highest FinishPctSum 然后它将返回这些输出的前X个,按最高FinishPctSum排名
Well this won't be terribly fast but it's an idea ... 好吧,这并不会很快,但这是一个主意...
## make a list of all combinations of 5 of Name, Salary, and FinishPct
xx <- with(df, lapply(list(as.character(Name), Salary, FinishPct), combn, 5))
## convert the names to a string,
## find the column sums of the others,
## set the names
yy <- setNames(
lapply(xx, function(x) {
if(typeof(x) == "character") apply(x, 2, toString) else colSums(x)
}),
names(df)[c(1, 2, 4)]
)
## coerce to data.frame
newdf <- as.data.frame(yy)
which results in 导致
# Names Salary FinishPct
# 1 Keita Nakamura, George Roop, Teruto Ishihara, Naoyuki Kotani, Yusuke Kasuya 44200 87.71
# 2 Keita Nakamura, George Roop, Teruto Ishihara, Naoyuki Kotani, Katsunori Kikuno 44500 88.43
# 3 Keita Nakamura, George Roop, Teruto Ishihara, Yusuke Kasuya, Katsunori Kikuno 44300 91.24
# 4 Keita Nakamura, George Roop, Naoyuki Kotani, Yusuke Kasuya, Katsunori Kikuno 44000 92.51
# 5 Keita Nakamura, Teruto Ishihara, Naoyuki Kotani, Yusuke Kasuya, Katsunori Kikuno 44100 93.83
# 6 George Roop, Teruto Ishihara, Naoyuki Kotani, Yusuke Kasuya, Katsunori Kikuno 43900 94.23
No check has been performed to determine whether the salaries are less than 50k. 没有执行任何检查来确定工资是否少于50k。 It just gives all the combinations of 5 fighters with their respective sums.
它只给出了5名战士的所有组合及其各自的总和。 You can subset to find those salaries less than 50k with
您可以子集查找薪水少于50k的那些
newdf[newdf$Salary <= 5e4, ]
Note that 5e4
is shorthand/scientific notation for 50,000. 请注意
5e4
是50,000的简写/科学计数法。
Data: 数据:
df <- structure(list(Name = structure(c(3L, 1L, 5L, 4L, 6L, 2L), .Label = c("George Roop",
"Katsunori Kikuno", "Keita Nakamura", "Naoyuki Kotani", "Teruto Ishihara",
"Yusuke Kasuya"), class = "factor"), Salary = c(9100L, 8900L,
9000L, 8700L, 8500L, 8800L), WinPct = c(31, 33, 33, 30.5, 29.6,
33.66), FinishPct = c(15.36, 15.76, 17.08, 18.35, 21.16, 21.88
)), .Names = c("Name", "Salary", "WinPct", "FinishPct"), class = "data.frame", row.names = c(NA,
-6L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.