简体   繁体   English

如何在 R 中执行多列的 countifs() 函数(计算文本值)

[英]How to do countifs() function of multiple columns in R (count the text values)

I want to group by users and count the number of where order_hour_type is "daytime" and "evening", and expressed in two columns named "daytime" and "evening" respectively, grouped by users.我想按用户分组并计算 order_hour_type 为“daytime”和“evening”的数量,并分别以“daytime”和“evening”两列表示,按用户分组。

user_id  order_hour_type order_day_type
1         daytime            weekend
1         daytime            weekday
1         daytime            weekday
1         daytime            weekend
2         evening            weekday
2         evening            weekday
2         evening            weekend
2         daytime            weekday
3         daytime            weekday
3         evening            weekday
3         daytime            weekday

And the result should be like this:结果应该是这样的:

user_id daytime evening weekend weekday
1         4       0        2       2
2         1       3        1       3
3         2       1        0       3

I have tried to use the package dplyr with following code:我尝试使用带有以下代码的包dplyr

(take adding the "daytime" column as an example) (以添加“白天”列为例)

agg1 <- df %>%
  group_by(user_id,order_hour_type) %>%
  summarise(
    daytime = sum(order_hour_type == "daytime"),
  )

and the result is weird with only one user:结果很奇怪,只有一个用户:

> head(agg1)
  daytime
1  834149

how can I do to generate my expected results?我该怎么做才能产生预期的结果? Thanks a lot!!非常感谢!!

An option would be to gather into 'long' format, then do a count on the columns and spread it back to 'wide'一种选择是gather成“长”格式,然后对列进行count并将其spread为“宽”

library(dplyr)
library(tidyr)
gather(df1, key, val, -user_id) %>% 
    count(user_id, val) %>%
    spread(val, n, fill = 0)
# A tibble: 3 x 5
#  user_id daytime evening weekday weekend
#    <int>   <dbl>   <dbl>   <dbl>   <dbl>
#1       1       4       0       2       2
#2       2       1       3       3       1
#3       3       2       1       3       0

or using melt/dcast from data.table或使用来自data.table melt/dcast data.table

library(data.table)
dcast(melt(setDT(df1), id.var = 'user_id'), user_id ~ value, length)

A base R option would be to replicate the first column by the number of other columns while unlist ing the other columns and use table一个base R选项是按其他列的数量复制第一列,同时unlist其他列并使用table

table(rep(df1[,1], 2), unlist(df1[-1]))

data数据

df1 <- structure(list(user_id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 
3L, 3L), order_hour_type = c("daytime", "daytime", "daytime", 
"daytime", "evening", "evening", "evening", "daytime", "daytime", 
"evening", "daytime"), order_day_type = c("weekend", "weekday", 
"weekday", "weekend", "weekday", "weekday", "weekend", "weekday", 
"weekday", "weekday", "weekday")), class = "data.frame", 
row.names = c(NA, 
-11L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM