[英]R, which row value contains the most same column values
Hello I have data set like this. 您好,我有这样的数据集。
Age Sallary
24 >50k
17 <=50k
31 >50k
24 >50k
I need to find the age which has the most >50k sallary 我需要找到> 50k最高的年龄
going with akrun's table
comment, 和akrun的
table
评论一起去,
names(which.max(table(df)[, ">50k"]))
[1] "24"
table
calculates the cross-tab of these two columns. table
计算这两列的交叉表。 [, ">50K"]
subsets to the column of salaries you are looking for, then which.max
pulls out the first element of this column that contains the maximum count. [, ">50K"]
子集到您要查找的薪水列中,然后which.max
拉出该列中包含最大数量的第一个元素。 Finally, since a named vector is returned by each of these functions, we can extract the age with names
. 最后,由于每个函数都返回了一个命名向量,因此我们可以使用
names
提取年龄。
With a data.frame with additional columns, you could replace table(df)
with table(df$Age, df$Sallary)
to select these variables from the data.frame. 对于具有其他列的data.frame,可以将
table(df)
替换为table(df$Age, df$Sallary)
以从data.frame中选择这些变量。
so 所以
names(which.max(table(df$Age, df$Sallary)[, ">50k"]))
[1] "24"
also works for the example dataset. 也适用于示例数据集。
data 数据
df <-
structure(list(Age = c(24L, 17L, 31L, 24L), Sallary = structure(c(2L,
1L, 2L, 2L), .Label = c("<=50k", ">50k"), class = "factor")), .Names = c("Age",
"Sallary"), class = "data.frame", row.names = c(NA, -4L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.