简体   繁体   中英

I get a weird result in summarise function in R

I am trying to count the amount of "number" that each uniqie "code" have. But I get as a result something strange. I cannot understand what wrong in this case.

"code" total
 

my dataset1:

number   code
TRUE    abc
TRUE    abc
FALSE   abc
TRUE    bbb
TRUE    bbb
TRUE    bbb
FALSE   cscs
FALSE   cscs
TRUE    cscs
30312    kkk

the result that I need:

code   total
abc    2
bbb    3
cscs   1

My code:

sum <- df1%>%
dplyr::group_by("code")%>%
dplyr::summarise(Total=sum(number, na.rm = TRUE))

Does this work:

sum <- df1%>%
    dplyr::group_by(code)%>%
    dplyr::summarise(Total=sum(as.numeric(as.logical(number)), na.rm = TRUE))

The explanation:

The number variable has logic values but also others, so you would need to transform first to numeric or even as.numeric(as.logical(number))

You will obtain:

# A tibble: 4 x 2
  code  Total
  <chr> <dbl>
1 abc       2
2 bbb       3
3 cscs      1
4 kkk       0

Or you can filter out rows that don't have either 'TRUE' or 'FALSE'

df %>% filter(number %in% c('TRUE','FALSE')) %>% type.convert(as.is = T) %>% group_by(code) %>% summarise(total = sum(number))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 2
  code  total
  <chr> <int>
1 abc       2
2 bbb       3
3 cscs      1

Using rowsum from base R

with(df1, rowsum(+(as.logical(number)), na.rm = TRUE, code))
#     [,1]
#abc     2
#bbb     3
#cscs    1
#kkk     0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM