简体   繁体   English

使用 forcats 和 purrr 总结字符向量列表

[英]summarizing a list of character vectors using forcats and purrr

I have tibble where col1 is a list of character vectors of variable length and col2 is a numeric vector indicating a group assignment, either 1 or 0. I want to first convert all of the character vectors in the list ( col1 ) to factors, and then unify all of the factors levels across these factors so that I can ultimately get a tally of counts for each factor level.我有 tibble,其中col1是可变长度的字符向量列表, col2是指示组分配的数字向量,1 或 0。我想首先将列表 ( col1 ) 中的所有字符向量转换为因子,并且然后统一这些因素的所有因素水平,以便我最终可以获得每个因素水平的计数。 For the example data below, that would mean the tally would be as follows:对于下面的示例数据,这意味着计数如下:

overall:总体:

    level, count  
    "a", 2
    "b", 2
    "c", 2
    "d", 3
    "e", 1

for group=1:对于组=1:

    level, count  
    "a", 1
    "b", 2
    "c", 1
    "d", 1
    "e", 0

for group=0:对于组=0:

    level, count  
    "a", 1
    "b", 0
    "c", 1
    "d", 2
    "e", 1

The ultimate goal is to be able to get a total count of each factor level c("a","b","c","d","e") and plot them by the grouping variable.最终目标是能够得到每个因子水平c("a","b","c","d","e")的总数c("a","b","c","d","e")并通过分组变量绘制它们。

Here is some code that might give better context to my problem:以下是一些可能为我的问题提供更好上下文的代码:

library(forcats)
library(purrr)
library(dplyr)
library(ggplot2)

tib <- tibble(col1=list(c("a","b"),
                 c("b","c","d"), 
                 c("a","d","e"),
                 c("c","d")),
       col2=c(1,1,0,0))


tib %>% 
  mutate(col3=map(.$col1,.f = as_factor)) %>% 
  mutate(col4=map(.$col3,.f = fct_unify))

Unfortunately, this code fails.不幸的是,这段代码失败了。 I get the following error, but don't know why:我收到以下错误,但不知道为什么:

Error: fs must be a list Error: fs must be a list

I thought my input was a list?我以为我的输入是一个列表?

I appreciate any help anyone might offer.我感谢任何人可能提供的任何帮助。 Thanks.谢谢。

You can first unnest and then count您可以先unnest ,然后再count

library(dplyr)
library(tidyr)

tib %>%
  unnest(col = col1) %>%
  #If needed col1 as factors
  #mutate(col1 =factor(col1)) %>%
  count(col1)

#  col1      n
#  <fct> <int>
#1 a         2
#2 b         2
#3 c         2
#4 d         3
#5 e         1

To count based on group ie col2 , we can do要根据组进行count ,即col2 ,我们可以这样做

tib %>% 
  unnest(col = col1) %>% 
  mutate_at(vars(col1, col2), factor) %>%
  count(col1, col2, .drop = FALSE)

#   col1  col2      n
#   <fct> <fct> <int>
# 1 a     0         1
# 2 a     1         1
# 3 b     0         0
# 4 b     1         2
# 5 c     0         1
# 6 c     1         1
# 7 d     0         2
# 8 d     1         1
# 9 e     0         1
#10 e     1         0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM