计算R中每组每个时间序列观测值的百分比

Question

New to R, so just my getting head around the data wrangling aspect. R的新手，所以我开始着手处理数据问题。 Tried looking for a similar question but couldn't find it. 试图寻找类似的问题，但找不到。

I would like to add an additional column that is the percentage of views split for each day between the article groups. 我想添加一个额外的列，该列是文章组之间每天的视图拆分百分比。 Example Dataset below 下面的示例数据集

  views       date      article
  1578   2015-01-01       A
  616    2015-01-01       B
  575    2015-01-01       C
  1744   2015-01-02       A
  541    2015-01-02       B
  660    2015-01-02       C
  2906   2015-01-03       A
  629    2015-01-03       B
  643    2015-01-03       C

And the expected result I am looking for.. 和我想要的预期结果。

 views     percentage   date           article
  1578     56.99        2015-01-01       A
  616      22.25        2015-01-01       B
  575      20.77        2015-01-01       C
  1744     59.22        2015-01-02       A
  541      18.37        2015-01-02       B
  660      22.41        2015-01-02       C
  2906     69.55        2015-01-03       A
  629      15.06        2015-01-03       B
  643      15.39        2015-01-03       C

I know this is possible by splitting the date frame using subsets but I would hope there is more neat approach using a library ? 我知道这可以通过使用子集分割日期框架来实现，但是我希望使用库可以有更简洁的方法吗？

Thanks ! 谢谢！

Answer 1

library(dplyr)
df %>% group_by(date) %>% mutate( percentage = views/sum(views))
Source: local data frame [9 x 4]
Groups: date

  views       date article percentage
1  1578 2015-01-01       A  0.5698808
2   616 2015-01-01       B  0.2224630
3   575 2015-01-01       C  0.2076562
4  1744 2015-01-02       A  0.5921902
5   541 2015-01-02       B  0.1837012
6   660 2015-01-02       C  0.2241087
7  2906 2015-01-03       A  0.6955481
8   629 2015-01-03       B  0.1505505
9   643 2015-01-03       C  0.1539014

Or, if multiple identical articles are possible per day: 或者，如果每天可能有多篇相同的文章：

df %>% group_by(date) %>% mutate(sum = sum(views)) %>% 
group_by(date, article) %>% mutate(percentage = views/sum) %>% 
select(-sum)

Answer 2

If df is your data.frame, you can do: 如果df是您的data.frame，则可以执行以下操作：

library(data.table)
setDT(df)[,percentage:=signif(100*views/sum(views),4),by=date][]
#   views       date article percentage
#1:  1578 2015-01-01       A      56.99
#2:   616 2015-01-01       B      22.25
#3:   575 2015-01-01       C      20.77
#4:  1744 2015-01-02       A      59.22
#5:   541 2015-01-02       B      18.37
#6:   660 2015-01-02       C      22.41
#7:  2906 2015-01-03       A      69.55
#8:   629 2015-01-03       B      15.06
#9:   643 2015-01-03       C      15.39

Or base R : 或基数R ：

df$percentage = signif(100*with(df, ave(views, date, FUN=function(x) x/sum(x))),4)

Data: 数据：

df = structure(list(views = c(1578L, 616L, 575L, 1744L, 541L, 660L, 
2906L, 629L, 643L), date = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 
3L, 3L, 3L), .Label = c("2015-01-01", "2015-01-02", "2015-01-03"
), class = "factor"), article = structure(c(1L, 2L, 3L, 1L, 2L, 
3L, 1L, 2L, 3L), .Label = c("A", "B", "C"), class = "factor"), 
percentage = c(56.99, 22.25, 20.77, 59.22, 18.37, 22.41, 
69.55, 15.06, 15.39)), .Names = c("views", "date", "article", 
"percentage"), class = "data.frame", row.names = c(NA, -9L))

计算R中每组每个时间序列观测值的百分比

问题描述

2 个解决方案

解决方案1
5 已采纳 2015-04-17 22:26:51

解决方案2
3 2015-04-17 22:23:14

计算R中每组每个时间序列观测值的百分比

问题描述

2 个解决方案

解决方案1 5 已采纳 2015-04-17 22:26:51

解决方案2 3 2015-04-17 22:23:14

解决方案1
5 已采纳 2015-04-17 22:26:51

解决方案2
3 2015-04-17 22:23:14