[英]Transforming dataframe to contain counts of values
Have created a dataframe that contains ids and stringvalues : 已创建一个包含id和stringvalues的数据框:
mycols <- c('id','2')
ids <- c(1,1,2,3)
stringvalues <- c('a','a','b','c')
mydf <- data.frame(ids , stringvalues)
mydf contains : mydf包含:
ids stringvalues
1 1 a
2 1 a
3 2 b
4 3 c
I'm attempting to produce a new dataframe that contains the id and corresponding counts for each string : 我正在尝试产生一个新的数据框,其中包含每个字符串的ID和相应的计数:
id, a , b , c
1 , 2 , 0 , 0
2 , 0 , 1 , 0
3 , 0 , 0 , 1
I'm trying to create multiple summarise implementations : 我正在尝试创建多个摘要实现:
g1 <- group_by(mydf , ids)
s1 <- summarise(g1 , a = count('a'))
s2 <- summarise(g1 , b = count('b'))
s3 <- summarise(g1 , c = count('c'))
But returns error : Evaluation error: no applicable method for 'groups' applied to an object of class "character".
但返回错误:
Evaluation error: no applicable method for 'groups' applied to an object of class "character".
How to create new columns that count number of string entries in the column ? 如何创建新列以计算该列中的字符串条目数?
Does doing a dplyr::count
followed by tidyr::spread
work for you? 做一个
dplyr::count
然后是tidyr::spread
是否对您tidyr::spread
? (I'm only posting this as you mentioned you were wanting to create a dataframe of this sort - otherwise it's much simpler to use table(mydf)
as the other comments/answers suggest.) (我只是按照您提到的那样发布此内容,否则您想要创建这种数据
table(mydf)
-否则使用table(mydf)
就像其他评论/答案所建议的要简单得多。)
library(dplyr)
library(tidyr)
mydf %>% count(ids, stringvalues) %>% spread(stringvalues, n, fill = 0)
#> # A tibble: 3 x 4
#> ids a b c
#> * <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 0 0
#> 2 2 0 1 0
#> 3 3 0 0 1
Here's a base-R solution: 这是base-R解决方案:
data.frame(cbind(table(mydf)))
Output option 1 (row # = ID): 输出选项1(行号= ID):
a b c
1 2 0 0
2 0 1 0
3 0 0 1
Output option 2 (with ID as column): 输出选项2(ID为列):
data.frame(cbind(id=unique(mydf$ids),table(mydf)))
id a b c
1 1 2 0 0
2 2 0 1 0
3 3 0 0 1
You can use count
directly. 您可以直接使用
count
。 First, 第一,
count(mydf, ids,stringvalues)
gives 给
# A tibble: 3 x 3
ids stringvalues n
<dbl> <fctr> <int>
1 1 a 2
2 2 b 1
3 3 c 1
then reshape, 然后重塑
count(mydf, ids,stringvalues) %>% tidyr::spread(stringvalues, n)
gives 给
# A tibble: 3 x 4
ids a b c
* <dbl> <int> <int> <int>
1 1 2 NA NA
2 2 NA 1 NA
3 3 NA NA 1
then replace the NAs with something like res[is.na(res)] <- 0
, where res
is the object constructed above. 然后将NA替换为
res[is.na(res)] <- 0
,其中res
是上面构造的对象。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.