在r中，如何計算一年內重復值的唯一出現次數？

Question

對於我的數據框中的每一年，我想計算具有 (face.data=="yes") 的鳥類占該年觀察到的鳥類總數的百分比。 一個問題是我在同一年內對同一只鳥進行了多次觀察。

這是我的數據集：

df <- data.frame(
  bird.ID = c(001, 001, 001, 002, 002, 002, 006 ,006, 007, 007, 007, 007), 
  date = c(2010-04-09, 2013-04-14, 2013-09-14, 2013-05-08, 2013-06-08, 2013-08-08, 2013-04-08, 2013-06-08, 2014-06-08, 2016-06-08, 2017-06-08, 2017-08-08), 
  face.data = c("yes", "yes", "no","yes", "yes", "no","yes", "yes", "no","yes", "yes", "no")
)

為了獲得每年“是”的數量，我嘗試了：

aggregate(face.data=="yes" ~ cut(date, "1 year"), data = df, sum)

但是，即使是同一只鳥，每一行都帶有“是”。

理想情況下，最終結果將是一個包含三列的數據幀：(i) 年份（例如 2013 年）； (ii) 當年觀察到的 Bird.ID 總數，(iii) 在這一年中觀察到的 face.data=="yes" 的唯一bird.ID 數量。

像這樣的東西：

            year  number of bird.ID         number of face.data 
 1           2013    10                             3      
 2           2014    15                             6      
 3           2015    20                             9

Answer 1

一個dplyr解決方案：

df %>% 
  mutate(date = ymd(date),
         Year= year(date)) %>%
  group_by(Year) %>% 
  summarise(total_birds = length(unique(bird.ID)),
            yes_birds = length(unique(bird.ID[face.data=='yes'])))

輸出：

# A tibble: 5 x 3
   Year total_birds yes_birds
  <dbl>       <int>     <int>
1  2010           1         1
2  2013           3         3
3  2014           1         0
4  2016           1         1
5  2017           1         1

或使用n_distinct() ：

df %>% 
  mutate(date = ymd(date),
         Year= year(date)) %>%
  group_by(Year) %>% 
  summarise(total_birds = n_distinct(bird.ID),
            yes_birds = n_distinct(bird.ID[face.data=='yes']))

Answer 2

使用data.table ：

dt <- data.table(df)
unique(dt[, .(bird.ID, year = year(date), face.data)])[
  , .(`number of bird.ID` = length(unique(bird.ID)), 
      `number of face.data` = sum(face.data=="yes")), 
  by=.(year)]

   year number of bird.ID number of face.data
1: 2010                 1                   1
2: 2013                 3                   3
3: 2014                 1                   0
4: 2016                 1                   1
5: 2017                 1                   1

Answer 3

您可以使用一個小功能快速解決問題：

yes_prop<-function(x)
{
  number_of_bird.ID<-length(unique(x$bird.ID)) # number of unique bird.IDs
  number_of_face.data<-length(unique(x$bird.ID[x$face.data=="yes"])) # setting "yes", number of unique bird.IDs
  data.frame(number_of_bird.ID,number_of_face.data)
}

對於簡化的日期 data.frame：

df <- data.frame(
  bird.ID = c(001, 001, 001, 002, 002, 002, 006 ,006, 007, 007, 007, 007), 
  date = c(2010, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2014, 2016, 2017, 2017), 
  face.data = c("yes", "yes", "no","yes", "yes", "no","yes", "yes", "no","yes", "yes", "no")
)

do.call(rbind,by(df,df$date, yes_prop)) # applying function year by year

無論如何，我毫不懷疑任何其他用戶都可以提供更智能的解決方案。

Answer 4

by方法中計算各自的lengths 。

首先，一些新鮮的樣本數據。

#    bird.ID       date face.data
# 1        4 2008-01-24        no
# 2        5 2008-05-25        no
# 3        4 2008-07-15        no
# 4        2 2008-08-13       yes
# 5        1 2008-09-15        no
# 6        2 2008-10-25       yes
# 7        1 2008-11-09       yes
# 8        2 2009-02-09        no
# 9        2 2009-04-25       yes
# 10       2 2009-05-18       yes
# 11       5 2009-09-12        no
# 12       4 2009-09-17        no
# 13       1 2009-12-27       yes
# 14       4 2010-04-15        no
# 15       1 2010-05-09        no
# 16       3 2010-07-10       yes
# 17       1 2010-08-02        no
# 18       1 2010-09-08        no
# 19       3 2010-09-10       yes
# 20       1 2010-09-23        no

by(dat, cut(dat$date, "1 year"), \(x)
   with(x, c(year=as.integer(strftime(date[[1]], '%Y')), 
             `number of bird.ID`=length(unique(bird.ID)), 
             `number of face.data`=length(unique(bird.ID[face.data == 'yes']))))) |>
  do.call(what=rbind) |> `rownames<-`(NULL) |> as.data.frame()

#   year number of bird.ID number of face.data
# 1 2008                 4                   2
# 2 2009                 4                   2
# 3 2010                 3                   1

數據：

n <- 20
set.seed(42)
dat <- data.frame(bird.ID=sample(1:5, n, replace=TRUE),
                  date=sample(seq.Date(as.Date('2008-01-01'), as.Date('2011-01-01'), 'day'), n, replace=TRUE),
                  face.data=sample(c('yes', 'no'), n, replace=TRUE))

在r中，如何計算一年內重復值的唯一出現次數？

問題描述

4 個解決方案

解決方案1
1 2022-06-02 15:23:14

解決方案2
0 2022-06-02 15:13:38

解決方案3
0 2022-06-02 15:18:35

解決方案4
0 已采納 2022-06-02 16:43:33

在r中，如何計算一年內重復值的唯一出現次數？

問題描述

4 個解決方案

解決方案1 1 2022-06-02 15:23:14

解決方案2 0 2022-06-02 15:13:38

解決方案3 0 2022-06-02 15:18:35

解決方案4 0 已采納 2022-06-02 16:43:33

解決方案1
1 2022-06-02 15:23:14

解決方案2
0 2022-06-02 15:13:38

解決方案3
0 2022-06-02 15:18:35

解決方案4
0 已采納 2022-06-02 16:43:33