簡體   English   中英

如何計算 r dataframe 中每一行的唯一因子的頻率

[英]How to count the frequency of unique factor across each row in r dataframe

我有一個如下數據集:

Age      Monday Tuesday Wednesday 
6-9        a     b        a
6-9        b     b        c
6-9              c        a
9-10       c     c        b
9-10       c     a        b

使用 R,我想獲得以下數據集/結果(其中每列代表每個獨特因素的總頻率):

Age        a     b        c
6-9        2     1        0
6-9        0     2        1
6-9        1     0        1
9-10       0     1        2
9-10       1     1        1

注意:我的數據還包含缺失值

幾個快速而骯臟的 tidyverse 解決方案 - 不過應該有一種方法可以減少步驟。

library(tidyverse) # install.packages("tidyverse")

input <- tribble(
~Age, ~Monday, ~Tuesday, ~Wednesday, 
"6-9", "a", "b", "a", 
"6-9", "b", "b", "c", 
"6-9", "", "c", "a",
"9-10", "c", "c", "b", 
"9-10", "c", "a", "b"
)

# pivot solution
input %>% 
  rowid_to_column() %>% 
  mutate_all(function(x) na_if(x, "")) %>%
  pivot_longer(cols = -c(rowid, Age), values_drop_na = TRUE) %>% 
  count(rowid, Age, value) %>% 
  pivot_wider(id_cols = c(rowid, Age), names_from = value, values_from = n, values_fill = list(n = 0)) %>% 
  select(-rowid)

# manual solution (if only a, b, c are expected as options)
input %>% 
  unite(col = "combine", Monday, Tuesday, Wednesday, sep = "") %>% 
  transmute(
    Age,
    a = str_count(combine, "a"),
    b = str_count(combine, "b"),
    c = str_count(combine, "c")
  )

在基礎 R 中,我們可以用NA替換空值,在 dataframe 中獲取唯一值,並使用逐行apply並使用table計算值的出現。

df[df == ''] <- NA
vals <- unique(na.omit(unlist(df[-1])))
cbind(df[1], t(apply(df, 1, function(x) table(factor(x, levels = vals)))))


#   Age a b c
#1  6-9 2 1 0
#2  6-9 0 2 1
#3  6-9 1 0 1
#4 9-10 0 1 2
#5 9-10 1 1 1

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM