[英]How to do countifs() function of multiple columns in R (count the text values)
I want to group by users and count the number of where order_hour_type is "daytime" and "evening", and expressed in two columns named "daytime" and "evening" respectively, grouped by users.我想按用户分组并计算 order_hour_type 为“daytime”和“evening”的数量,并分别以“daytime”和“evening”两列表示,按用户分组。
user_id order_hour_type order_day_type
1 daytime weekend
1 daytime weekday
1 daytime weekday
1 daytime weekend
2 evening weekday
2 evening weekday
2 evening weekend
2 daytime weekday
3 daytime weekday
3 evening weekday
3 daytime weekday
And the result should be like this:结果应该是这样的:
user_id daytime evening weekend weekday
1 4 0 2 2
2 1 3 1 3
3 2 1 0 3
I have tried to use the package dplyr
with following code:我尝试使用带有以下代码的包dplyr
:
(take adding the "daytime" column as an example) (以添加“白天”列为例)
agg1 <- df %>%
group_by(user_id,order_hour_type) %>%
summarise(
daytime = sum(order_hour_type == "daytime"),
)
and the result is weird with only one user:结果很奇怪,只有一个用户:
> head(agg1)
daytime
1 834149
how can I do to generate my expected results?我该怎么做才能产生预期的结果? Thanks a lot!!非常感谢!!
An option would be to gather
into 'long' format, then do a count
on the columns and spread
it back to 'wide'一种选择是gather
成“长”格式,然后对列进行count
并将其spread
为“宽”
library(dplyr)
library(tidyr)
gather(df1, key, val, -user_id) %>%
count(user_id, val) %>%
spread(val, n, fill = 0)
# A tibble: 3 x 5
# user_id daytime evening weekday weekend
# <int> <dbl> <dbl> <dbl> <dbl>
#1 1 4 0 2 2
#2 2 1 3 3 1
#3 3 2 1 3 0
or using melt/dcast
from data.table
或使用来自data.table
melt/dcast
data.table
library(data.table)
dcast(melt(setDT(df1), id.var = 'user_id'), user_id ~ value, length)
A base R
option would be to replicate the first column by the number of other columns while unlist
ing the other columns and use table
一个base R
选项是按其他列的数量复制第一列,同时unlist
其他列并使用table
table(rep(df1[,1], 2), unlist(df1[-1]))
df1 <- structure(list(user_id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L,
3L, 3L), order_hour_type = c("daytime", "daytime", "daytime",
"daytime", "evening", "evening", "evening", "daytime", "daytime",
"evening", "daytime"), order_day_type = c("weekend", "weekday",
"weekday", "weekend", "weekday", "weekday", "weekend", "weekday",
"weekday", "weekday", "weekday")), class = "data.frame",
row.names = c(NA,
-11L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.