[英]R subset dataframe - max values and NA
I have the following test - dataframe我有以下测试 - dataframe
df <- data.frame(V1 = c(1, 2, 3), V2 = c(0, 5, NA), V3=c(NA, 10, NA), V4=c(2, 2, NA))
> df
V1 V2 V3 V4
1 1 0 NA 2
2 2 5 10 2
3 3 NA NA NA
Now I want to subset this dataframe:现在我想对这个 dataframe 进行子集化:
OR或者
So the result should look like this:所以结果应该是这样的:
df_new df_new
V1 V2 V3 V4
1 1 0 NA 2
3 3 NA NA NA
Only the first & third row of the original dataframe are kept.仅保留原 dataframe 的第一排和第三排。
I could use the following command:我可以使用以下命令:
subset(DF, (is.na(V2) & is.na(V3) & is.na(V4)) | ((V2 < 3 | is.na(V2)) & (V3 < 3 | is.na(V3)) & (V4 < 3 | is.na(V4))))
to do this.去做这个。 But it's quite tedious & in my real-life data frame has > 30 columns to check, so there must be a better way of doing this.
但这很乏味,在我的现实生活中的数据框中有 > 30 列要检查,所以必须有更好的方法来做这件事。
You can do:你可以做:
df[rowSums(df[, 2:4] >= 3, na.rm = TRUE) == 0, ]
V1 V2 V3 V4
1 1 0 NA 2
3 3 NA NA NA
In dplyr
, we can use filter_at
to select specific columns to check, replace
NA
values to 0 and select rows where all values are less than 3.在
dplyr
中,我们可以使用filter_at
对 select 特定列进行检查, replace
NA
值替换为 0 和 select 行,其中所有值都小于 3。
library(dplyr)
df %>% filter_at(vars(V2:V4), all_vars(replace(., is.na(.), 0) < 3))
# V1 V2 V3 V4
#1 1 0 NA 2
#2 3 NA NA NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.