在R中使用dplyr子集數據幀

Question

使用R我試圖根據一些參數來過濾我的數據框。

這是數據幀：

Groups_name  col1   col2
group1       3       4
group1       1       1
group1       1       1
group2       1       1
group3       3       7
group3       1       1
group4       3       3
group4       1       1

按組，我只希望保留包含至少一行的組，其中col1 > 1且col1 == col2或col1 == col2+-2

在這里我應該得到：

Groups_name  col1   col2
group1       3      4
group1       1      1
group1       1      1
group4       3      3
group4       1      1

如您所見，我保留了group1因為在第一行中， col1 >1和col1 (3) = col2 +1 (4)我也保留了group 3因為col1 >1和col1 (3) == col2 (3)

但group 1已刪除，因為col1什么not > 1

而且我也刪除了group 3因為即使col1 (3) > 1 ， col1 (3)不等於7 +或- 2 （因此不等於5,6,7,8或9 ）

從現在開始，我嘗試：

tab %>%
  group_by(Groups_name) %>%
  filter(all(col1 == col2,col2-2,col2+2))  %>%
  filter(any(col1 > 1))

謝謝您幫忙。

Answer 1

我們可以使用any和all以下列方式

library(dplyr)
df %>%
  group_by(Groups_name) %>%
  filter(any(col1 > 1) & all(abs(col1 - col2) %in% 0:2))

#  Groups_name  col1  col2
#  <fct>       <int> <int>
#1 group1          3     4
#2 group1          1     1
#3 group1          1     1
#4 group4          3     3
#5 group4          1     1

這將選擇col1中至少有一個大於1的值且col1和col2之間的絕對差始終在0和2之間的組。

Answer 2

我們可以在data.table執行此data.table

library(data.table)
setDT(df)[, .SD[any(col1 >1) & all(abs(col1 - col2) %in% 0:2)], .(Groups_name)]
#   Groups_name col1 col2
#1:      group1    3    4
#2:      group1    1    1
#3:      group1    1    1
#4:      group4    3    3
#5:      group4    1    1

數據

df <- structure(list(Groups_name = c("group1", "group1", "group1", 
"group2", "group3", "group3", "group4", "group4"), col1 = c(3L, 
1L, 1L, 1L, 3L, 1L, 3L, 1L), col2 = c(4L, 1L, 1L, 1L, 7L, 1L, 
3L, 1L)), class = "data.frame", row.names = c(NA, -8L))

在R中使用dplyr子集數據幀

問題描述

2 個解決方案

解決方案1
2 已采納 2019-05-01 09:28:05

解決方案2
1 2019-05-01 14:03:19

數據

在R中使用dplyr子集數據幀

問題描述

2 個解決方案

解決方案1 2 已采納 2019-05-01 09:28:05

解決方案2 1 2019-05-01 14:03:19

數據

解決方案1
2 已采納 2019-05-01 09:28:05

解決方案2
1 2019-05-01 14:03:19