簡體   English   中英

如何在不同級別的兩個變量中使用anti_join?

[英]How to use anti_join with different levels of two variables?

我已經嘗試了幾個小時,但我無法弄清楚。 我有一個包含主題和條件df1的數據框,我想從中排除具有特定值的觀察值( df2的變量“值”中小於 3。我無法使其工作,因為我需要從df1刪除組合兩個變量的不同水平。

這是 df1:

df1 <- structure(list(subject = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,2L, 2L, 2L, 2L, 
                                  2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), 
                      condition = c("A", "A", "A", "B", "B", "B", "C", "C","C", "A", "A", 
                                    "A", "B", "B", "B", "C", "C", "C", "A", "A", "A","B", "B", "B", "C", "C", "C")), 
                 row.names = c(NA, -27L), class = c("tbl_df", "tbl", "data.frame"))

這是 df2

df2 <- structure(list(subject = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L,4L, 4L, 4L, 5L, 5L, 5L), 
                      condition = c("A", "B", "C", "A", "B","C", "A", "B", "C", "A", "B", "C", "A", "B", "C"), 
                      value = c(10L, 8L, 7L, 3L, 8L, 5L, 3L, 3L, 9L, 8L, 7L, 8L, 10L, 6L, 2L)), 
                 row.names = c(NA,-15L), class = c("tbl_df", "tbl", "data.frame"))

我想在df1刪除值小於 3 的所有主題和條件的組合,因此這將是最終的 df:

df3 <- structure(list(subject = c(2L, 3L, 3L, 5L), 
                      condition = c("A","A", "B", "C")), 
                 row.names = c(NA, -4L), 
                 class = c("tbl_df","tbl", "data.frame"))

到目前為止,我一直這樣做,但我不能了,因為我有數百行......

df3 <- df1 %>% filter(!(subject==2 & condition=="A" |
                        subject==3 & (condition=="A" | condition=="B") |
                        subject==5 & condition=="C"))

您的df3示例結果與您用來派生它的代碼沖突,因此這里是一個dplyr解決方案,用於對您想要的df3每種解釋。

注意:這兩種結果只有在您

...排除具有特定值(來自 df2.x 的變量“值”中小於[或等於] 3 的觀察值)。

所以我使用不等式<= 3而不是< 3來實現這些解決方案。

df3第一個解釋

獲取df3的版本

# A tibble: 4 x 2
  subject condition
    <int> <chr>    
1       2 A        
2       3 A        
3       3 B        
4       5 C        

您在此處提供的示例結果

我想在 df1 中刪除值低於 3 的所有主題和條件的組合,因此這將是最終的 df

 df3 <- structure(list(subject = c(2L, 3L, 3L, 5L), condition = c("A","A", "B", "C")), row.names = c(NA, -4L), class = c("tbl_df","tbl", "data.frame"))

只需在df2上使用filter()

library(dplyr)


# ...
# Code to generate 'df1' and 'df2'.
# ...

df3 <- df2 %>% filter(value <= 3)

df3第二個解釋

但是,我看來您實際上需要以下版本的df3

# A tibble: 18 x 2
   subject condition
     <int> <chr>    
 1       1 A        
 2       1 A        
 3       1 A        
 4       1 B        
 5       1 B        
 6       1 B        
 7       1 C        
 8       1 C        
 9       1 C        
10       2 B        
11       2 B        
12       2 B        
13       2 C        
14       2 C        
15       2 C        
16       3 C        
17       3 C        
18       3 C        

你在這里得出的:

df3 <- df1 %>% filter(!(subject==2 & condition=="A" |
                        subject==3 & (condition=="A" |condition=="B") |
                        subject==5 & condition=="C"))

這種情況下,你應該anti_join()你的df1到一個filter() ed 版本的df2

library(dplyr)


# ...
# Code to generate 'df1' and 'df2'.
# ...


df3 <- df1 %>%
  anti_join(df2 %>% filter(value <= 3), by = c("subject", "condition"))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM