簡體   English   中英

如何在不使用重復的()或唯一的()的情況下確定一個觀察是否存在於列中?

[英]How to identify an observation exists once in a column without using duplicated() or unique()?

總的來說,我對觀察使用不同服務的順序感興趣。 特別是,我有興趣確定在數據集中一次出現的任何觀察(按ID),以便我可以識別僅使用一次服務的那些觀察。 最終,我想確定僅使用兩次服務的觀察結果。 在提供的數據集中,只有一個觀測值(Id = 3370)僅使用一次服務,而一個觀測值(Id = 3360)僅使用了兩次服務。

我已經嘗試過plicated()和unique()

df=data.frame(Id=c(6431,6431,6431,6431,3066,3066,
                   3066,3371,3371,3371,3370,3360,3360),
            Order=c(1,2,3,4,3,2,1,2,1,3,1,1,2),
            Service=c("Coaching","Events","Fairs","Coaching",
                       "Coaching","Events","Fairs","Coaching",
                       "Events","Fairs","Coaching","Events","Coaching"))

> df
     Id Order  Service
1  6431     1 Coaching
2  6431     2   Events
3  6431     3    Fairs
4  6431     4 Coaching
5  3066     3 Coaching
6  3066     2   Events
7  3066     1    Fairs
8  3371     2 Coaching
9  3371     1   Events
10 3371     3    Fairs
11 3370     1 Coaching
12 3360     1   Events
13 3360     2 Coaching

當我運行!duplicated() ,它無法識別我期望的ID = 3370,因為這是唯一具有唯一ID的觀察結果。

!duplicated(df$Id)
> !duplicated(df$Id)
 [1] FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE

因此,什么代碼將標識觀察結果在此數據集中僅顯示一次? 當觀測值出現兩次時,如何將其擴展到?

聽起來您不只是想了解唯一Id ,而且還很重要。 您可以使用add_countdplyr

library(tidyverse)
df <- data.frame(Id = c(6431, 6431, 6431, 6431, 3066, 3066, 3066, 3371, 3371, 3371, 3370, 3360, 3360), Order = c(1, 2, 3, 4, 3, 2, 1, 2, 1, 3, 1, 1, 2), Service = c("Coaching", "Events", "Fairs", "Coaching", "Coaching", "Events", "Fairs", "Coaching", "Events", "Fairs", "Coaching", "Events", "Coaching"))  
df %>%
  add_count(Id)
#> # A tibble: 13 x 4
#>       Id Order Service      n
#>    <dbl> <dbl> <fct>    <int>
#>  1  6431     1 Coaching     4
#>  2  6431     2 Events       4
#>  3  6431     3 Fairs        4
#>  4  6431     4 Coaching     4
#>  5  3066     3 Coaching     3
#>  6  3066     2 Events       3
#>  7  3066     1 Fairs        3
#>  8  3371     2 Coaching     3
#>  9  3371     1 Events       3
#> 10  3371     3 Fairs        3
#> 11  3370     1 Coaching     1
#> 12  3360     1 Events       2
#> 13  3360     2 Coaching     2

reprex軟件包 (v0.3.0)創建於2019-05-23

尋找table

> table(df$Id)

3066 3360 3370 3371 6431 
   3    2    1    3    4 

您可以使用ave將結果添加到現有data.frame中

> df$n <- with(df, ave(Id, Id, FUN=length))
> df
     Id Order  Service n
1  6431     1 Coaching 4
2  6431     2   Events 4
3  6431     3    Fairs 4
4  6431     4 Coaching 4
5  3066     3 Coaching 3
6  3066     2   Events 3
7  3066     1    Fairs 3
8  3371     2 Coaching 3
9  3371     1   Events 3
10 3371     3    Fairs 3
11 3370     1 Coaching 1
12 3360     1   Events 2
13 3360     2 Coaching 2

顯然有許多方法可以做到這一點。 :)這可能不是最快的方法,但是很直觀:

  • 按編號分組
  • 計數觀察
  • 篩選單身人士...或雙打等。
  • 刪除我們剛剛創建的虛擬字段
library(dplyr)
df %>% 
  group_by(Id) %>% 
  summarise(observations = n()) %>% 
  filter(observations == 1) %>% 
  select(-observations)

無論您要檢查observations == 1observations == 2還是observations == whatever ,這都會為每個ID產生一條記錄。 add_count將始終為每個原始觀察返回一條記錄。

PS如果希望將其作為向量,則可以使用pull(Id)作為最后一行,而不是select(-observations)

使用library(data.table)我們可以做

setDT(df)[, .N, Id]
#      Id N
# 1: 6431 4
# 2: 3066 3
# 3: 3371 3
# 4: 3370 1
# 5: 3360 2

或將計數添加為列

df[, n := .N, Id]
#       Id Order  Service n
#  1: 6431     1 Coaching 4
#  2: 6431     2   Events 4
#  3: 6431     3    Fairs 4
#  4: 6431     4 Coaching 4
#  5: 3066     3 Coaching 3
#  6: 3066     2   Events 3
#  7: 3066     1    Fairs 3
#  8: 3371     2 Coaching 3
#  9: 3371     1   Events 3
# 10: 3371     3    Fairs 3
# 11: 3370     1 Coaching 1
# 12: 3360     1   Events 2
# 13: 3360     2 Coaching 2

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM