[英]How to identify an observation exists once in a column without using duplicated() or unique()?
總的來說,我對觀察使用不同服務的順序感興趣。 特別是,我有興趣確定在數據集中一次出現的任何觀察(按ID),以便我可以識別僅使用一次服務的那些觀察。 最終,我想確定僅使用兩次服務的觀察結果。 在提供的數據集中,只有一個觀測值(Id = 3370)僅使用一次服務,而一個觀測值(Id = 3360)僅使用了兩次服務。
我已經嘗試過plicated()和unique()
df=data.frame(Id=c(6431,6431,6431,6431,3066,3066,
3066,3371,3371,3371,3370,3360,3360),
Order=c(1,2,3,4,3,2,1,2,1,3,1,1,2),
Service=c("Coaching","Events","Fairs","Coaching",
"Coaching","Events","Fairs","Coaching",
"Events","Fairs","Coaching","Events","Coaching"))
> df
Id Order Service
1 6431 1 Coaching
2 6431 2 Events
3 6431 3 Fairs
4 6431 4 Coaching
5 3066 3 Coaching
6 3066 2 Events
7 3066 1 Fairs
8 3371 2 Coaching
9 3371 1 Events
10 3371 3 Fairs
11 3370 1 Coaching
12 3360 1 Events
13 3360 2 Coaching
當我運行!duplicated()
,它無法識別我期望的ID = 3370,因為這是唯一具有唯一ID的觀察結果。
!duplicated(df$Id)
> !duplicated(df$Id)
[1] FALSE TRUE TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE FALSE FALSE TRUE
因此,什么代碼將標識觀察結果在此數據集中僅顯示一次? 當觀測值出現兩次時,如何將其擴展到?
聽起來您不只是想了解唯一Id
,而且還很重要。 您可以使用add_count
的dplyr
:
library(tidyverse)
df <- data.frame(Id = c(6431, 6431, 6431, 6431, 3066, 3066, 3066, 3371, 3371, 3371, 3370, 3360, 3360), Order = c(1, 2, 3, 4, 3, 2, 1, 2, 1, 3, 1, 1, 2), Service = c("Coaching", "Events", "Fairs", "Coaching", "Coaching", "Events", "Fairs", "Coaching", "Events", "Fairs", "Coaching", "Events", "Coaching"))
df %>%
add_count(Id)
#> # A tibble: 13 x 4
#> Id Order Service n
#> <dbl> <dbl> <fct> <int>
#> 1 6431 1 Coaching 4
#> 2 6431 2 Events 4
#> 3 6431 3 Fairs 4
#> 4 6431 4 Coaching 4
#> 5 3066 3 Coaching 3
#> 6 3066 2 Events 3
#> 7 3066 1 Fairs 3
#> 8 3371 2 Coaching 3
#> 9 3371 1 Events 3
#> 10 3371 3 Fairs 3
#> 11 3370 1 Coaching 1
#> 12 3360 1 Events 2
#> 13 3360 2 Coaching 2
由reprex軟件包 (v0.3.0)創建於2019-05-23
尋找table
?
> table(df$Id)
3066 3360 3370 3371 6431
3 2 1 3 4
您可以使用ave
將結果添加到現有data.frame中
> df$n <- with(df, ave(Id, Id, FUN=length))
> df
Id Order Service n
1 6431 1 Coaching 4
2 6431 2 Events 4
3 6431 3 Fairs 4
4 6431 4 Coaching 4
5 3066 3 Coaching 3
6 3066 2 Events 3
7 3066 1 Fairs 3
8 3371 2 Coaching 3
9 3371 1 Events 3
10 3371 3 Fairs 3
11 3370 1 Coaching 1
12 3360 1 Events 2
13 3360 2 Coaching 2
顯然有許多方法可以做到這一點。 :)這可能不是最快的方法,但是很直觀:
library(dplyr)
df %>%
group_by(Id) %>%
summarise(observations = n()) %>%
filter(observations == 1) %>%
select(-observations)
無論您要檢查observations == 1
, observations == 2
還是observations == whatever
,這都會為每個ID產生一條記錄。 add_count
將始終為每個原始觀察返回一條記錄。
PS如果希望將其作為向量,則可以使用pull(Id)
作為最后一行,而不是select(-observations)
。
使用library(data.table)
我們可以做
setDT(df)[, .N, Id]
# Id N
# 1: 6431 4
# 2: 3066 3
# 3: 3371 3
# 4: 3370 1
# 5: 3360 2
或將計數添加為列
df[, n := .N, Id]
# Id Order Service n
# 1: 6431 1 Coaching 4
# 2: 6431 2 Events 4
# 3: 6431 3 Fairs 4
# 4: 6431 4 Coaching 4
# 5: 3066 3 Coaching 3
# 6: 3066 2 Events 3
# 7: 3066 1 Fairs 3
# 8: 3371 2 Coaching 3
# 9: 3371 1 Events 3
# 10: 3371 3 Fairs 3
# 11: 3370 1 Coaching 1
# 12: 3360 1 Events 2
# 13: 3360 2 Coaching 2
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.