[英]summarizing a list of character vectors using forcats and purrr
I have tibble where col1
is a list of character vectors of variable length and col2
is a numeric vector indicating a group assignment, either 1 or 0. I want to first convert all of the character vectors in the list ( col1
) to factors, and then unify all of the factors levels across these factors so that I can ultimately get a tally of counts for each factor level.我有 tibble,其中col1
是可变长度的字符向量列表, col2
是指示组分配的数字向量,1 或 0。我想首先将列表 ( col1
) 中的所有字符向量转换为因子,并且然后统一这些因素的所有因素水平,以便我最终可以获得每个因素水平的计数。 For the example data below, that would mean the tally would be as follows:对于下面的示例数据,这意味着计数如下:
overall:总体:
level, count
"a", 2
"b", 2
"c", 2
"d", 3
"e", 1
for group=1:对于组=1:
level, count
"a", 1
"b", 2
"c", 1
"d", 1
"e", 0
for group=0:对于组=0:
level, count
"a", 1
"b", 0
"c", 1
"d", 2
"e", 1
The ultimate goal is to be able to get a total count of each factor level c("a","b","c","d","e")
and plot them by the grouping variable.最终目标是能够得到每个因子水平c("a","b","c","d","e")
的总数c("a","b","c","d","e")
并通过分组变量绘制它们。
Here is some code that might give better context to my problem:以下是一些可能为我的问题提供更好上下文的代码:
library(forcats)
library(purrr)
library(dplyr)
library(ggplot2)
tib <- tibble(col1=list(c("a","b"),
c("b","c","d"),
c("a","d","e"),
c("c","d")),
col2=c(1,1,0,0))
tib %>%
mutate(col3=map(.$col1,.f = as_factor)) %>%
mutate(col4=map(.$col3,.f = fct_unify))
Unfortunately, this code fails.不幸的是,这段代码失败了。 I get the following error, but don't know why:我收到以下错误,但不知道为什么:
Error:
fs must be a list
Error:
fs must be a list
I thought my input was a list?我以为我的输入是一个列表?
I appreciate any help anyone might offer.我感谢任何人可能提供的任何帮助。 Thanks.谢谢。
You can first unnest
and then count
您可以先unnest
,然后再count
library(dplyr)
library(tidyr)
tib %>%
unnest(col = col1) %>%
#If needed col1 as factors
#mutate(col1 =factor(col1)) %>%
count(col1)
# col1 n
# <fct> <int>
#1 a 2
#2 b 2
#3 c 2
#4 d 3
#5 e 1
To count
based on group ie col2
, we can do要根据组进行count
,即col2
,我们可以这样做
tib %>%
unnest(col = col1) %>%
mutate_at(vars(col1, col2), factor) %>%
count(col1, col2, .drop = FALSE)
# col1 col2 n
# <fct> <fct> <int>
# 1 a 0 1
# 2 a 1 1
# 3 b 0 0
# 4 b 1 2
# 5 c 0 1
# 6 c 1 1
# 7 d 0 2
# 8 d 1 1
# 9 e 0 1
#10 e 1 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.