[英]Return name of column containing max value, from only certain selected columns in a data.frame
I would like to obtain (in an new column in the data.table) the column name of the column that contains the maximum value in only a few columns in a data.frame.我想获取(在 data.table 的新列中)包含 data.frame 中仅几列中的最大值的列的列名。
Here is an example data.frame这是一个示例 data.frame
# creating the vectors then the data frame ------
id = c("a", "b", "c", "d")
ignore = c(1000,1000, 1000, 1000)
s1 = c(0,0,0,100)
s2 = c(100,0,0,0)
s3 = c(0,0,50,0)
s4 = c(50,0,50,0)
df1 <- data.frame(id,ignore,s1,s2,s3,s4)
(1) now I want to find the column name of the maximum number in each row, from the columns s1-s4. (1) 现在我想从 s1-s4 列中找到每行中最大数字的列名。 (ie ignore the column called "ignore")
(即忽略名为“忽略”的列)
(2) If there is a tie for the maximum, I would like the last (eg s4) column name returned. (2) 如果最大值并列,我希望返回最后一个(例如 s4)列名。
(3) as an extra favour - if all are 0, I would ideally like NA returned (3) 作为一个额外的好处 - 如果都是 0,我希望 NA 返回
here is my best attempt so far这是我迄今为止最好的尝试
df2 <- cbind(df1,do.call(rbind,apply(df1,1,function(x) {data.frame(max.col.name=names(df1)[which.max(x)],stringsAsFactors=FALSE)})))
this returns ignore in each case, and (except for row b) works if I remove this column, and reorder the s1-s4 columns as s4-s1.这在每种情况下都会返回忽略,并且(b 行除外)如果我删除此列并将 s1-s4 列重新排序为 s4-s1 则有效。
How would you approach this?你会如何处理这个问题?
Many thanks indeed.确实非常感谢。
We use grep
to create a column index for columns that start with 's' followed by numbers ('i1').我们使用
grep
为以“s”开头、后跟数字 (“i1”) 的列创建列索引。 To get the row index of the subset dataset ('df1[i1]') that has the maximum value, we can use max.col
with the option ties.method='last'
.要获取具有最大值的子集数据集 ('df1[i1]') 的行索引,我们可以使用
max.col
和选项ties.method='last'
。 To convert the rows that have only 0 values to NA, we get the rowSums
, check if that is 0 ( ==0
) and convert those to NA
( NA^
) and multiply with max.col
output.要将只有 0 个值的行转换为 NA,我们得到
rowSums
,检查它是否为 0 ( ==0
) 并将它们转换为NA
( NA^
) 并乘以max.col
输出。 This can be used to extract the column names of subset dataset.这可用于提取子集数据集的列名。
i1 <- grep('^s\\d+', names(df1))
names(df1)[i1][max.col(df1[i1], 'last')*NA^(rowSums(df1[i1])==0)]
#[1] "s2" NA "s4" "s1"
library(dplyr)
library(tidyr)
df1 = data_frame(
id = c("a", "b", "c", "d")
ignore = c(1000,1000, 1000, 1000)
s1 = c(0,0,0,100)
s2 = c(100,0,0,0)
s3 = c(0,0,50,0)
s4 = c(50,0,50,0))
result =
df1 %>%
gather(variable, value, -id, -ignore) %>%
group_by(id) %>%
slice(value %>%
{. == max(.)} %>%
which %>%
last) %>%
ungroup %>%
mutate(variable_fix = ifelse(value == 0,
NA,
variable))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.