[英]How to select columns if there is not any NA in the last n observations? How to drop columns if there are more than x adjacent NA's observations?
I need the following: 我需要以下内容:
1) Keep the columns if: i) The last n observations (n = 3) aren't NA's, ii) there is no NA's at all, iii) Backwards from the last NA's, there are not more than 3 adjacent NA observations 1)在以下情况下保留列:i)最后n个观测值(n = 3)不是NA,ii)根本没有NA,iii)从最后一个NA向后倒数,相邻的NA观测值不超过3
2) Drop the columns if: i) There are 3 or more adjacent NA observations 2)如果出现以下情况,请删除列:i)有3个或更多相邻的NA观测值
I'd like if the answer is using dplyr 我想如果答案是使用dplyr
Some example: 一些例子:
data = data.frame(
A = c(3,3,3,3,4, rep(NA,5)),
B = c(rnorm(10)),
C = c(rep(NA,3), rnorm(7)),
D = c(rnorm(8), NA, NA)
)
I've tried: 我试过了:
data %>%
select_if(~sum(!is.na(.)) >= 3)
select_if(~sum(is.na(.)) > 0)
In my example, I'd only keep B, C and D. 在我的示例中,我只保留B,C和D。
We can use tail
to get last n
entries and drop the columns where all
of them are NA
. 我们可以使用tail
获取最后n
个条目,并删除all
均为NA
的列。
n <- 3
library(dplyr)
data %>% select_if(~!all(is.na(tail(., n))))
# B C D
#1 0.5697 NA 0.29145
#2 -0.1351 NA -0.44329
#3 2.4016 NA 0.00111
#4 -0.0392 0.153 0.07434
#5 0.6897 2.173 -0.58952
#6 0.0280 0.476 -0.56867
#7 -0.7433 -0.710 -0.13518
#8 0.1888 0.611 1.17809
#9 -1.8050 -0.934 NA
#10 1.4656 -1.254 NA
Or with inverted logic 或采用反逻辑
data %>% select_if(~any(!is.na(tail(., n))))
For the second condition, 对于第二个条件,
Drop the columns if: i) There are 3 or more adjacent NA observations 如果出现以下情况,请删除列:i)有3个或更多相邻的NA观测值
we can use rle
to get adjacent values 我们可以使用rle
获取相邻值
data %>% select_if(~!any(with(rle(is.na(.)), lengths[values]) >= n))
# B D
#1 0.5697 0.29145
#2 -0.1351 -0.44329
#3 2.4016 0.00111
#4 -0.0392 0.07434
#5 0.6897 -0.58952
#6 0.0280 -0.56867
#7 -0.7433 -0.13518
#8 0.1888 1.17809
#9 -1.8050 NA
#10 1.4656 NA
Since we already have the functions, we can use the same in base R as well with sapply
由于我们已经有了这些功能,因此我们可以在base R和sapply
使用相同的功能
#Condition 1
data[!sapply(data, function(x) all(is.na(tail(x, n))))]
#Condition 2
data[!sapply(data, function(x) any(with(rle(is.na(x)), lengths[values]) >= n))]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.