簡體   English   中英

觀測次數占每年觀測總數的比例

[英]Number of observation as a share of total observations per year

我在 R 中有以下數據框:

    Year   ID
1   2018   x
2   2018   x
3   2018   y
4   2018   z
5   2019   x
6   2019   x
7   2019   z     

我想分別計算每年“ID”列中“x”在總觀測值中的份額。

結果應該是這樣的:

Year   Share of x
2018   50 %
2019   67 %

有沒有可能用aggregate來做,就像這樣:

aggregate(length(which(df$ID == x)) / length(df$ID), by=Year)

或任何其他功能?

假設在最終使用table的注釋中可重現地顯示數據來計算計數,然后使用prop.table來計算每個作為其行的比例。

prop.table(table(dat), 1)
##       ID
## Year           x         y         z
##   2018 0.5000000 0.2500000 0.2500000
##   2019 0.6666667 0.0000000 0.3333333

或者如果你想要每列的比例:

prop.table(table(dat), 2)
##       ID
## Year     x   y   z
##   2018 0.5 1.0 0.5
##   2019 0.5 0.0 0.5

總計的

關於問題的aggregate標簽,第一種情況可以這樣完成:

aggregate(ID ~ Year, dat, 
  function(id) sapply(unique(dat$ID), function(x) setNames(mean(id == x), x)))
##   Year      ID.x      ID.y      ID.z
## 1 2018 0.5000000 0.2500000 0.2500000
## 2 2019 0.6666667 0.0000000 0.3333333

或同時使用aggregatetable

aggregate(ID ~ Year, dat, function(x) table(x) / length(x))
##   Year      ID.x ID.y      ID.z
## 1 2018 0.5000000 0.25 0.2500000
## 2 2019 0.6666667 0.00 0.3333333

dplyr / 整理器

library(dplyr)
library(tidyr)

dat %>%
  count(Year, ID) %>%
  group_by(Year) %>%
  mutate(prop = n / sum(n)) %>%
  pivot_wider(-n, names_from = "ID", values_from = "prop", values_fill = list(prop = 0))

## # A tibble: 2 x 4
## # Groups:   Year [2]
##    Year     x     y     z
##   <int> <dbl> <dbl> <dbl>
## 1  2018 0.5    0.25 0.25 
## 2  2019 0.667  0    0.333

筆記

Lines <- "    Year   ID
1   2018   x
2   2018   x
3   2018   y
4   2018   z
5   2019   x
6   2019   x
7   2019   z     "
dat <- read.table(text = Lines)

也許你想這樣做

dfout<- setNames(aggregate(ID~Year,df,function(v) sum(v=="x")/length(v)*100),
                 c("Year","Share of x"))

以至於

> dfout
  Year Share of x
1 2018   50.00000
2 2019   66.66667

數據

df <-structure(list(Year = c(2018L, 2018L, 2018L, 2018L, 2019L, 2019L, 
2019L), ID = c("x", "x", "y", "z", "x", "x", "z")), class = "data.frame", row.names = c(NA, 
-7L))

Tidyverse 方法:

library(tidyverse)

data<- tribble(~year,~id,
               2018,"x",
               2018,"x",
               2018,"y",
               2018,"z",
               2019,"x",
               2019,"x",
               2019,"z"

)


agg <- data %>% group_by(year,id) %>% 
            summarise(cnt_id = n()) %>% # count id per year
            group_by(year) %>% 
            mutate(cnt_obs = sum(cnt_id),#count total obs per year
                   share = cnt_id/cnt_obs) %>% 
                    filter(id=="x") %>% 
                    select(year,id,share)
head(agg)
   year id    share
  <dbl> <chr> <dbl>
1  2018 x     0.5  
2  2019 x     0.667

我會爭辯說 2019y 缺失了,但仍然

library(tidyverse)

df<- tribble(~year,~id,
               2018,"x",
               2018,"x",
               2018,"y",
               2018,"z",
               2019,"x",
               2019,"x",
               2019,"z"

)

df %>% 
  group_by(year,id) %>% 
  tally() %>% 
  group_by(year) %>% 
  mutate(prop = n/sum(n)) %>% 
  ungroup() %>% 
  select(-n) %>% 
  pivot_wider(names_from = id,values_from = prop) %>% 
  mutate_all(~ replace_na(.,replace = 0))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM