对于每一行，返回非 NA 值的列索引和名称

Question

I have data frame where each row contain one non- NA value.我有一个数据框，其中每一行都包含一个非NA值。

ED1 ED2 ED3 ED4 ED5 
1   NA  NA  NA  NA 
NA  NA  1   NA  NA 
NA  1   NA  NA  NA 
NA  NA  NA  NA  1

For each row, I want to get the index and name of the column containing the non- NA value, ie:对于每一行，我想获取包含非NA值的列的索引和名称，即：

Indices: c(1, 3, 2, 5) , and their corresponding column names: c("ED1" "ED3" "ED2" "ED5")索引： c(1, 3, 2, 5) ，及其对应的列名： c("ED1" "ED3" "ED2" "ED5")

Answer 1

There is no need to use an apply() loop here.这里不需要使用apply()循环。 You could use max.col() in combination with a negated call to is.na() .您可以将max.col()与对is.na()的否定调用结合使用。

max.col(!is.na(df))
# [1] 1 3 2 5

That gives us the column numbers where the 1s are.这给了我们 1 所在的列号。 To get the column names, we can use that in a vector subset of the names() of the data frame.要获取列名，我们可以在数据框的names()的向量子集中使用它。

names(df)[max.col(!is.na(df))]
# [1] "ED1" "ED3" "ED2" "ED5"

So we can get the desired data frame, with factor column, by doing所以我们可以通过做

data.frame(EDU = names(df)[max.col(!is.na(df))])
#   EDU
# 1 ED1
# 2 ED3
# 3 ED2
# 4 ED5

Data:数据：

df <- structure(list(ED1 = c(1, NA, NA, NA), ED2 = c(NA, NA, 1, NA), 
    ED3 = c(NA, 1, NA, NA), ED4 = c(NA, NA, NA, NA), ED5 = c(NA, 
    NA, NA, 1)), .Names = c("ED1", "ED2", "ED3", "ED4", "ED5"
), row.names = c(NA, -4L), class = "data.frame")

Answer 2

df <- data.frame( ED1 = c(  1, NA, NA, NA),
                  ED2 = c( NA, NA, 1 , NA),
                  ED3 = c( NA,  1, NA, NA),
                  ED4 = c( NA, NA, NA, NA),
                  ED5 = c( NA, NA, NA,  1)  )

df_new <- data.frame( EDU = as.factor(apply(df,1,which.min)) )
levels(df_new$EDU) <- paste0("ED",levels(df_new$EDU))

. .

> df
  ED1 ED2 ED3 ED4 ED5
1   1  NA  NA  NA  NA
2  NA  NA   1  NA  NA
3  NA   1  NA  NA  NA
4  NA  NA  NA  NA   1
> df_new
  EDU
1 ED1
2 ED3
3 ED2
4 ED5

Answer 3

Another option is另一种选择是

 v1 <- names(df)[+(!is.na(df)) %*% seq_along(df)]
 v1
 #[1] "ED1" "ED3" "ED2" "ED5"

 data.frame(EDU=v1)

Or using pmax或者使用pmax

names(df)[do.call(pmax, c(df *col(df), list(na.rm=TRUE)))]
#[1] "ED1" "ED3" "ED2" "ED5"

对于每一行，返回非 NA 值的列索引和名称

问题描述

3 个解决方案

解决方案1
7 已采纳 2015-12-24 22:44:09

解决方案2
1 2015-12-24 22:04:48

解决方案3
1 2015-12-25 06:30:49

对于每一行，返回非 NA 值的列索引和名称

问题描述

3 个解决方案

解决方案1 7 已采纳 2015-12-24 22:44:09

解决方案2 1 2015-12-24 22:04:48

解决方案3 1 2015-12-25 06:30:49

解决方案1
7 已采纳 2015-12-24 22:44:09

解决方案2
1 2015-12-24 22:04:48

解决方案3
1 2015-12-25 06:30:49