简体   繁体   English

转换数据框以包含值计数

[英]Transforming dataframe to contain counts of values

Have created a dataframe that contains ids and stringvalues : 已创建一个包含id和stringvalues的数据框:

mycols <- c('id','2')
ids <- c(1,1,2,3)
stringvalues <- c('a','a','b','c')
mydf <- data.frame(ids , stringvalues)

mydf contains : mydf包含:

  ids stringvalues
1   1            a
2   1            a
3   2            b
4   3            c

I'm attempting to produce a new dataframe that contains the id and corresponding counts for each string : 我正在尝试产生一个新的数据框,其中包含每个字符串的ID和相应的计数:

id, a , b , c
1 , 2 , 0 , 0
2 , 0 , 1 , 0
3 , 0 , 0 , 1

I'm trying to create multiple summarise implementations : 我正在尝试创建多个摘要实现:

g1 <- group_by(mydf , ids)  
s1 <- summarise(g1 , a = count('a')) 
s2 <- summarise(g1 , b = count('b')) 
s3 <- summarise(g1 , c = count('c')) 

But returns error : Evaluation error: no applicable method for 'groups' applied to an object of class "character". 但返回错误: Evaluation error: no applicable method for 'groups' applied to an object of class "character".

How to create new columns that count number of string entries in the column ? 如何创建新列以计算该列中的字符串条目数?

Does doing a dplyr::count followed by tidyr::spread work for you? 做一个dplyr::count然后是tidyr::spread是否对您tidyr::spread (I'm only posting this as you mentioned you were wanting to create a dataframe of this sort - otherwise it's much simpler to use table(mydf) as the other comments/answers suggest.) (我只是按照您提到的那样发布此内容,否则您想要创建这种数据table(mydf) -否则使用table(mydf)就像其他评论/答案所建议的要简单得多。)

library(dplyr)
library(tidyr)

mydf %>% count(ids, stringvalues) %>% spread(stringvalues, n, fill = 0)

#> # A tibble: 3 x 4
#>     ids     a     b     c
#> * <dbl> <dbl> <dbl> <dbl>
#> 1     1     2     0     0
#> 2     2     0     1     0
#> 3     3     0     0     1

Here's a base-R solution: 这是base-R解决方案:

data.frame(cbind(table(mydf)))

Output option 1 (row # = ID): 输出选项1(行号= ID):

  a b c
1 2 0 0
2 0 1 0
3 0 0 1

Output option 2 (with ID as column): 输出选项2(ID为列):

data.frame(cbind(id=unique(mydf$ids),table(mydf)))

  id a b c
1  1 2 0 0
2  2 0 1 0
3  3 0 0 1

You can use count directly. 您可以直接使用count First, 第一,

count(mydf, ids,stringvalues)

gives

 # A tibble: 3 x 3
 ids stringvalues     n
 <dbl>       <fctr> <int>
1     1            a     2
2     2            b     1
3     3            c     1

then reshape, 然后重塑

count(mydf, ids,stringvalues) %>% tidyr::spread(stringvalues, n)

gives

# A tibble: 3 x 4
    ids     a     b     c
* <dbl> <int> <int> <int>
1     1     2    NA    NA
2     2    NA     1    NA
3     3    NA    NA     1

then replace the NAs with something like res[is.na(res)] <- 0 , where res is the object constructed above. 然后将NA替换为res[is.na(res)] <- 0 ,其中res是上面构造的对象。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM