简体   繁体   English

查找数据框中特定列的第一个NON-NA的索引

[英]Finding index of a first NON-NA for a specific column in data frame

I have a data frame with multiple columns. 我有一个包含多列的数据框。 Some of the data is missing (NA). 某些数据丢失(NA)。 I sorted the data frame by one column, and now the data is sorted properly but NA's are sorted as last values. 我按一列对数据帧进行了排序,现在数据已正确排序,但NA却作为最后一个值排序。 I want to get the index of the last non-na value. 我想获取最后一个非na值的索引。

column1 column2
1       2
2       na
3       some data
4       some data
na      some data
na      some data
na      some data

So I want to get the index of 4. I tried 所以我想得到4的索引。

which(is.na(DF))

but it doesn't seem to return na values. 但它似乎没有返回na值。

I was attracted to this thread because I needed to find the first non-NA in each column of a data frame. 我被这个线程吸引了,因为我需要在数据帧的每一列中找到第一个非NA。 Even though the original question is actually about finding the last non-NA in a column, I was able to figure out how to find the first non-NA from others' answers. 即使最初的问题实际上是关于查找列中的最后一个非NA,但我仍然能够找出如何从其他人的答案中找到第一个非NA。 I listed both below in case someone is wondering about the same thing. 如果有人想知道同一件事,我在下面列出了两者。

Here is sample data. 这是示例数据。 Notice that the columns should have been sorted with NAs at the beginning or end of each column. 请注意,这些列应该已经在每列的开头或结尾处用NA进行了排序。

(df <- data.frame(c=c(NA,NA,13,14,15), 
             d=c(16,17,NA,NA,NA), 
             e=c(NA,NA,NA,NA,NA), 
             f=c(18,19,20,21,22)))
   c  d  e  f
1 NA 16 NA 18
2 NA 17 NA 19
3 13 NA NA 20
4 14 NA NA 21
5 15 NA NA 22

Two ways to find the first non-NA in each column. 在每列中找到第一个非NA的两种方法。 First is to use a for loop 首先是使用for循环

x1 <- vector("numeric")
for (j in 1:ncol(df)) {
  x1[j]<-df[,j] [min(which(!is.na(df[,j])))]
}

> x1
[1] 13 16 NA 18

Or use sapply. 或使用sapply。 complete.cases does the same thing as !is.na on vectors. complete.cases在向量上的作用与!is.na相同。

(x2 <- sapply(seq_len(ncol(df)), function(x) df[,x] [min(which(!is.na(df[,x])))]))
[1] 13 16 NA 18
(x3 <- sapply(seq_len(ncol(df)), function(x) df[,x] [min(which(complete.cases(df[,x])))]))
[1] 13 16 NA 18

Similarly, there are two ways to find the last non-NA. 同样,有两种查找最后一个非NA的方法。

y1 <- vector("numeric")
for (j in 1:ncol(df)) {
  y1[j] <- df[,j][max(which(!is.na(df[,j])))]
}
> y1
[1] 15 17 NA 22

(y2 <- sapply(seq_len(ncol(df)), function(x) df[,x] [max(which(!is.na(df[,x])))]))
[1] 15 17 NA 22
(y3 <- sapply(seq_len(ncol(df)), function(x) df[,x] [max(which(complete.cases(df[,x])))]))
[1] 15 17 NA 22

Based on my testing, the two methods have similar speed. 根据我的测试,两种方法的速度相似。

似乎您需要此表达式:

max(which(complete.cases(DF$column1)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM