[英]dplyr, summarise categorical variable
I want to summarise my data small
for each different video.id using dplyr
. 我想总结一下我的数据
small
,使用各种不同video.id dplyr
。
small %>%
group_by(Video.ID) %>%
summarise(sumr = sum(Partner.Revenue),
len = mean(Video.Duration..sec.),
cat = mean(Category))
mean(Category) is clearly the wrong approach. 均值(类别)显然是错误的做法。 How do I get it just to use the value that is repeated several times (one video.id has always the same category no matter how often it appears in the dataframe).
我如何才能使用重复多次的值(一个video.id始终是相同的类别,无论它在数据帧中出现的频率如何)。
My dataframe looks like this : 我的数据框看起来像这样:
small
# A tibble: 6 x 7
X1 X1_1 Video.ID Video.Duration..sec. Category Owned.Views Partner.Revenue
<int> <int> <chr> <int> <chr> <int> <dbl>
1 1 1 ---0zh9uzSE 1184 gadgets 6 0
2 2 2 ---0zh9uzSE 1184 gadgets 6 0
3 3 3 ---0zh9uzSE 1184 gadgets 2 0
4 4 4 ---0zh9uzSE 1184 gadgets 1 0
5 5 5 ---0zh9uzSE 1184 gadgets 1 0
6 6 6 ---0zh9uzSE 1184 gadgets 3 0
small <-
structure(list(X1 = 1:6,
X1_1 = 1:6,
Video.ID = c("---0zh9uzSE", "---0zh9uzSE", "---0zh9uzSE", "---0zh9uzSE", "---0zh9uzSE", "---0zh9uzSE"),
Video.Duration..sec. = c(1184L, 1184L, 1184L, 1184L, 1184L, 1184L),
Category = c("gadgets", "gadgets", "gadgets", "gadgets", "gadgets", "gadgets"),
Owned.Views = c(6L, 6L, 2L, 1L, 1L, 3L),
Partner.Revenue = c(0, 0, 0, 0, 0, 0)),
row.names = c(NA, -6L),
class = c("tbl_df", "tbl", "data.frame"))
You have at least two options to solve this: 您至少有两个选项可以解决此问题:
Add the Category column to your group_by
: 将Category列添加到
group_by
:
small %>%
group_by(Video.ID, cat = Category) %>%
summarise(sumr = sum(Partner.Revenue),
len = mean(Video.Duration..sec.))
# A tibble: 1 x 4
# Groups: Video.ID [?]
# Video.ID cat sumr len
# <chr> <chr> <dbl> <dbl>
# 1 ---0zh9uzSE gadgets 0 1184
Or use unique(Catregory)
: 或使用
unique(Catregory)
:
small %>%
group_by(Video.ID) %>%
summarise(sumr = sum(Partner.Revenue),
len = mean(Video.Duration..sec.),
cat = unique(Category))
# A tibble: 1 x 4
# Video.ID sumr len cat
# <chr> <dbl> <dbl> <chr>
# 1 ---0zh9uzSE 0 1184 gadgets
The first option, might be perferred, because it still works if you have multiple categories per id. 第一个选项可能是允许的,因为如果每个ID有多个类别,它仍然有效。
Since it is a unique category for each video_id
, you can have cat = Category[1]
, as in 由于它是每个
video_id
的唯一类别,因此您可以使用cat = Category[1]
,如
small %>% group_by(Video.ID) %>%
summarise(sumr=sum(Partner.Revenue), len = mean(Video.Duration..sec.),
cat = Category[1])
# A tibble: 1 x 4
# Video.ID sumr len cat
# <chr> <dbl> <dbl> <chr>
# 1 ---0zh9uzSE 0 1184 gadgets
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.