简体   繁体   English

R、dplyr 中的子集 dataframe 过滤列 A 的行值而不是列 B 的行中的 NA

[英]Subset dataframe in R, dplyr filter row values of column A not NA in row of column B

I have a dataset consisting of a time series study.我有一个由时间序列研究组成的数据集。 Since some participants didn't show up for certain days, they have NA values for rest of the data frame, but certain study days were crucial, so I am trying to subset my data to participants not missing these crucial days.由于某些参与者在某些日子没有出现,因此他们具有数据框 rest 的 NA 值,但某些研究日期至关重要,因此我试图将我的数据子集给参与者,不要错过这些关键日子。 My dataset is actually very large but here's the general structure:我的数据集实际上非常大,但这是一般结构:

fakedat <- data.frame(ID = c("A", "A", "A", "A", "B", "B", "B", "B", "C", "C", "C", "C", 
                          "D", "D", "D", "D", "E", "E", "E", "E", "F", "F", "F", "F"),
                           StudyDay = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 
                                        1, 2, 3, 4),
                           Ab = c(10, NA, 15, 10, 10, 20, 10, NA, 10, 10, NA, 30, NA, NA, 15, NA, 10, 20,
                                  10, 30, NA, 10, NA, 20))

Now let's say it was crucial they show up at day 2 and 4, I tried subsetting using dplyr filtering like this:现在假设它们在第 2 天和第 4 天出现至关重要,我尝试使用 dplyr 过滤进行子集设置,如下所示:

fakedat2 <- fakedat %>%
  dplyr::group_by(ID) %>%
  dplyr::filter(StudyDay %in% c(2, 4) & !is.na(Ab)) %>%
  dplyr:: ungroup()

EDIT: But the output of this dataset is only the list if IDs that have a 2 or 4 that's not an NA value.编辑:但是这个数据集的 output 只是如果 ID 的 2 或 4 不是 NA 值的列表。 I need to find (in my real data) subjects who have NA Ab values at 4 specific Study Days.我需要找到(在我的真实数据中)在 4 个特定研究日具有 NA Ab 值的受试者。 The answer I accepted below works but still curious about performing conditional filtering?我在下面接受的答案有效,但仍然对执行条件过滤感到好奇? Like in SAS you could code "IF Ab.=NA at (StudyDay=2 AND StudyDay=4) THEN ID....or something like that.就像在 SAS 中一样,您可以编写“IF Ab.=NA at (StudyDay=2 AND StudyDay=4) THEN ID....或类似的代码。

Maybe this will achieve your goal.也许这会达到你的目标。 If all participants have all StudyDay timepoints, and you just want to see if not missing in days 2 or 4, you can just check the Ab values at those time points in your filter .如果所有参与者都有所有StudyDay时间点,并且您只想查看第 2 天或第 4 天是否缺失,您可以在filter中检查这些时间点的Ab值。 In this case, an ID will be omitted if is NA in both days 2 and 4 (in this example, "D").在这种情况下,如果在第 2 天和第 4 天都为NA (在此示例中为“D”),则将省略ID

Alternatively, if you want to require that both values are available for days 2 and 4, you can use & (AND) instead of |或者,如果您希望这两个值在第 2 天和第 4 天都可用,您可以使用& (AND) 而不是| (OR). (或者)。

library(dplyr)

fakedat %>%
  group_by(ID) %>%
  filter(!is.na(Ab[StudyDay == 2]) | !is.na(Ab[StudyDay == 4]))

If you have multiple days to check are not missing, you can use all and check values for NA where the StudyDay is %in% a vector of required days as follows:如果您有多个要检查的天数,您可以使用all并检查NA的值,其中StudyDay%in%所需天数的向量,如下所示:

required_vals <- c(2, 4)

fakedat %>%
  group_by(ID) %>%
  filter(all(!is.na(Ab[StudyDay %in% required_vals])))

Output Output

   ID    StudyDay    Ab
   <chr>    <dbl> <dbl>
 1 A            1    10
 2 A            2    NA
 3 A            3    15
 4 A            4    10
 5 B            1    10
 6 B            2    20
 7 B            3    10
 8 B            4    NA
 9 C            1    10
10 C            2    10
11 C            3    NA
12 C            4    30
13 E            1    10
14 E            2    20
15 E            3    10
16 E            4    30
17 F            1    NA
18 F            2    10
19 F            3    NA
20 F            4    20

In base R , we can dobase R中,我们可以做

subset(fakedat, ID %in% ID[StudyDay %in% c(2, 4) & !is.na(Ab)])

-output -输出

#    ID StudyDay Ab
#1   A        1 10
#2   A        2 NA
#3   A        3 15
#4   A        4 10
#5   B        1 10
#6   B        2 20
#7   B        3 10
#8   B        4 NA
#9   C        1 10
#10  C        2 10
#11  C        3 NA
#12  C        4 30
#17  E        1 10
#18  E        2 20
#19  E        3 10
#20  E        4 30
#21  F        1 NA
#22  F        2 10
#23  F        3 NA
#24  F        4 20

Or a similar option in dplyrdplyr中的类似选项

library(dplyr)
fakedat %>%
     filter(ID %in% ID[StudyDay %in% c(2, 4) & !is.na(Ab)])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R 如果列 X 为 Na 且列 B 不是“值”,则删除行 - R Remove row if column X is Na AND column B is not 'value' R:在 dataframe 中填充各种长度的 NA 值,将每行的第 2 列添加到非 NA 行的最后一列 - R: Within a dataframe filled with various lengths of NA values, add column 2 of each rows to the last column of the row which is not NA 按值过滤r数据框的行和列名称 - Filter r dataframe for row and column names by value 将第一行作为我的数据框的列名,在 R 中使用 dplyr - Put the first row as the column names of my dataframe with dplyr in R 如果行满足特定列条件,如何将数据帧值重新编码为 NA - How to recode dataframe values to NA if row meet specific column criteria 子集根据R中的第一行值设置数据框列 - Subset set a dataframe column based on first row value in R 在R中填充数据框列重复行值 - Fill dataframe column repeating row values in r 将r替换为r中每个组的相同列的另一行中的值 - Replace NA with values in another row of same column for each group in r 如何在 R dataframe 中将数据从第 i 行第 2 列更新到第 j 行第 1 列但由两个变量 (dplyr) 分组? - How to update data from column i row 2 to column j row 1 but grouped by two variables (dplyr) in a R dataframe? 如何按R中单列中的字符串标签对行值进行子集化? - How to subset row values by string label in a single column in R?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM