[英]Exclude rows where value used in another row
假設您有以下數據集:
df = data.frame(ID = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20), gender= c(1,2,1,2,2,2,2,1,1,2,1,2,1,2,2,2,2,1,1,2),
PID = c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10))
我如何編寫代碼來刪除 df 中性別和 PID 相同的行(見圖)。 請想象一下代碼超過 1000 行(因此它應該是一個自動搜索要排除的正確值的解決方案)。
df[ave(rep(TRUE, nrow(df)), df[,c("gender","paar")], FUN = function(z) !any(duplicated(z))),]
# ID gender paar
# 1 1 1 1
# 2 2 2 1
# 3 3 1 2
# 4 4 2 2
# 7 7 2 4
# 8 8 1 4
# 9 9 1 5
# 10 10 2 5
# 11 11 1 6
# 12 12 2 6
# 13 13 1 7
# 14 14 2 7
# 17 17 2 9
# 18 18 1 9
# 19 19 1 10
# 20 20 2 10
library(dplyr)
df %>%
group_by(gender, paar) %>%
filter(!any(duplicated(cbind(gender, paar)))) %>%
ungroup()
在base R
中,我們可以在刪除 'gender' 和 'paar' 的組計數不為 1 的觀察值后使用subset
subset(df, ave(seq_along(gender), gender, paar, FUN = length) == 1)
或者duplicated
df[!(duplicated(df[-1])|duplicated(df[-1], fromLast = TRUE)),]
-輸出
ID gender paar
1 1 1 1
2 2 2 1
3 3 1 2
4 4 2 2
7 7 2 4
8 8 1 4
9 9 1 5
10 10 2 5
11 11 1 6
12 12 2 6
13 13 1 7
14 14 2 7
17 17 2 9
18 18 1 9
19 19 1 10
20 20 2 10
使用aggregate
na.omit(aggregate(. ~ gender + PID, df, function(x)
ifelse(length(x) == 1, x, NA)))
gender PID ID
1 1 1 1
2 2 1 2
3 1 2 3
4 2 2 4
6 1 4 8
7 2 4 7
8 1 5 9
9 2 5 10
10 1 6 11
11 2 6 12
12 1 7 13
13 2 7 14
15 1 9 18
16 2 9 17
17 1 10 19
18 2 10 20
用dplyr
library(dplyr)
df %>%
group_by(gender, PID) %>%
filter(n() == 1) %>%
ungroup()
# A tibble: 16 × 3
ID gender PID
<dbl> <dbl> <dbl>
1 1 1 1
2 2 2 1
3 3 1 2
4 4 2 2
5 7 2 4
6 8 1 4
7 9 1 5
8 10 2 5
9 11 1 6
10 12 2 6
11 13 1 7
12 14 2 7
13 17 2 9
14 18 1 9
15 19 1 10
16 20 2 10
另一個dplyr
選項可以是:
df %>%
filter(with(rle(paste0(gender, PID)), rep(lengths == 1, lengths)))
ID gender PID
1 1 1 1
2 2 2 1
3 3 1 2
4 4 2 2
5 7 2 4
6 8 1 4
7 9 1 5
8 10 2 5
9 11 1 6
10 12 2 6
11 13 1 7
12 14 2 7
13 17 2 9
14 18 1 9
15 19 1 10
16 20 2 10
如果重復值也可能出現在非連續行之間:
df %>%
arrange(gender, PID) %>%
filter(with(rle(paste0(gender, PID)), rep(lengths == 1, lengths)))
這是另一個::-)
library(dplyr)
df %>%
group_by(gender, PID) %>%
filter(is.na(ifelse(n()>1, 1, NA)))
ID gender PID
<dbl> <dbl> <dbl>
1 1 1 1
2 2 2 1
3 3 1 2
4 4 2 2
5 7 2 4
6 8 1 4
7 9 1 5
8 10 2 5
9 11 1 6
10 12 2 6
11 13 1 7
12 14 2 7
13 17 2 9
14 18 1 9
15 19 1 10
16 20 2 10
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.