[英]How to get names of certain variables in a column in R?
I have a data frame that looks like:我有一个看起来像这样的数据框:
ID CO1 CO2 ED1 ED2 max
1 1 2 1 3 3
2 1 3 3 2 3
3 4 2 2 1 3
4 3 3 4 4 4
...
10 1 1 1 1 1
How do I get R to give me the name(s) of the columns that contain a particular number contanined in the colum max and assign them to a new column, named “best”?如何让 R 给我包含列最大值中包含的特定数字的列的名称并将它们分配给一个名为“最佳”的新列?
I want something like this:我想要这样的东西:
ID CO1 CO2 ED1 ED2 max best
1 1 2 1 3 3 ED2
2 1 3 3 2 3 CO2
3 4 2 2 1 4 CO1
4 3 3 4 4 4 ED1
...
10 1 1 1 1 1 CO2
In case there are more values equal to the one contained in the max column (as for example in row 2 or row 10), one at random is fine.如果有更多的值等于 max 列中包含的值(例如在第 2 行或第 10 行中),则随机一个就可以了。
I have seen several solution to problems similar to this one, but none that effectively works in my case.我已经看到了几种与此类似的问题的解决方案,但没有一个在我的情况下有效。
You can use max.col
:您可以使用
max.col
:
cols <- grep('CO|ED', names(df), value = TRUE)
df$best <- cols[max.col(df[cols] == df$max)]
df
# ID CO1 CO2 ED1 ED2 max best
#1 1 1 2 1 3 3 ED2
#2 2 1 3 3 2 3 CO2
#3 3 4 2 2 1 4 CO1
#4 4 3 3 4 4 4 ED1
#5 10 1 1 1 1 1 ED2
You can check ties.method
in ?max.col
to get first/last match in each row.您可以检查
?max.col
ties.method
获取每行中的第一个/最后一个匹配项。
data数据
df <- structure(list(ID = c(1L, 2L, 3L, 4L, 10L), CO1 = c(1L, 1L, 4L,
3L, 1L), CO2 = c(2L, 3L, 2L, 3L, 1L), ED1 = c(1L, 3L, 2L, 4L,
1L), ED2 = c(3L, 2L, 1L, 4L, 1L), max = c(3L, 3L, 4L, 4L, 1L)),
row.names = c(NA, -5L), class = "data.frame")
No need to be overly fancy:无需过于花哨:
d <- read.table(text=
" ID CO1 CO2 ED1 ED2 max
1 1 2 1 3 3
2 1 3 3 2 3
3 4 2 2 1 3
4 3 3 4 4 4
10 1 1 1 1 1
", header=TRUE )
max.columns <- d %>% select(matches("CO|ED")) %>%
apply( 1, which.max )
d$best <- colnames(d)[ max.columns+1 ]
d
Outputs:输出:
> d
ID CO1 CO2 ED1 ED2 max best
1 1 1 2 1 3 3 ED2
2 2 1 3 3 2 3 CO2
3 3 4 2 2 1 3 CO1
4 4 3 3 4 4 4 ED1
5 10 1 1 1 1 1 CO1
Long Base R solution with "best" vector containing the names of all of the best vectors: Long Base R 解决方案,“最佳”向量包含所有最佳向量的名称:
# Store as a variable the names of the raw data vectors:
# dvecs => character vector
dvecs <- setdiff(names(df), c("ID", "max"))
# Store a matrix of booleans denoting if the column contains the max value:
# bool_test => logical matrix
bool_test <- df$max == df[,dvecs]
# Store a vector containing the names of the columns with the max values:
# best => character vector
df$best <- apply(
data.frame(
vapply(
seq_along(dvecs),
function(i) {
ifelse(bool_test[, i], dvecs[i], NA_character_)
},
character(nrow(bool_test))
)
),
1,
function(x) {
paste0(na.omit(x), collapse = ", ")
}
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.