[英]Calculate Percentage for each time series observations per Group in R
New to R, so just my getting head around the data wrangling aspect. R的新手,所以我开始着手处理数据问题。 Tried looking for a similar question but couldn't find it.
试图寻找类似的问题,但找不到。
I would like to add an additional column that is the percentage of views split for each day between the article groups. 我想添加一个额外的列,该列是文章组之间每天的视图拆分百分比。 Example Dataset below
下面的示例数据集
views date article
1578 2015-01-01 A
616 2015-01-01 B
575 2015-01-01 C
1744 2015-01-02 A
541 2015-01-02 B
660 2015-01-02 C
2906 2015-01-03 A
629 2015-01-03 B
643 2015-01-03 C
And the expected result I am looking for.. 和我想要的预期结果。
views percentage date article
1578 56.99 2015-01-01 A
616 22.25 2015-01-01 B
575 20.77 2015-01-01 C
1744 59.22 2015-01-02 A
541 18.37 2015-01-02 B
660 22.41 2015-01-02 C
2906 69.55 2015-01-03 A
629 15.06 2015-01-03 B
643 15.39 2015-01-03 C
I know this is possible by splitting the date frame using subsets but I would hope there is more neat approach using a library ? 我知道这可以通过使用子集分割日期框架来实现,但是我希望使用库可以有更简洁的方法吗?
Thanks ! 谢谢 !
library(dplyr)
df %>% group_by(date) %>% mutate( percentage = views/sum(views))
Source: local data frame [9 x 4]
Groups: date
views date article percentage
1 1578 2015-01-01 A 0.5698808
2 616 2015-01-01 B 0.2224630
3 575 2015-01-01 C 0.2076562
4 1744 2015-01-02 A 0.5921902
5 541 2015-01-02 B 0.1837012
6 660 2015-01-02 C 0.2241087
7 2906 2015-01-03 A 0.6955481
8 629 2015-01-03 B 0.1505505
9 643 2015-01-03 C 0.1539014
Or, if multiple identical articles are possible per day: 或者,如果每天可能有多篇相同的文章:
df %>% group_by(date) %>% mutate(sum = sum(views)) %>%
group_by(date, article) %>% mutate(percentage = views/sum) %>%
select(-sum)
If df
is your data.frame, you can do: 如果
df
是您的data.frame,则可以执行以下操作:
library(data.table)
setDT(df)[,percentage:=signif(100*views/sum(views),4),by=date][]
# views date article percentage
#1: 1578 2015-01-01 A 56.99
#2: 616 2015-01-01 B 22.25
#3: 575 2015-01-01 C 20.77
#4: 1744 2015-01-02 A 59.22
#5: 541 2015-01-02 B 18.37
#6: 660 2015-01-02 C 22.41
#7: 2906 2015-01-03 A 69.55
#8: 629 2015-01-03 B 15.06
#9: 643 2015-01-03 C 15.39
Or base R
: 或基数
R
:
df$percentage = signif(100*with(df, ave(views, date, FUN=function(x) x/sum(x))),4)
Data: 数据:
df = structure(list(views = c(1578L, 616L, 575L, 1744L, 541L, 660L,
2906L, 629L, 643L), date = structure(c(1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L), .Label = c("2015-01-01", "2015-01-02", "2015-01-03"
), class = "factor"), article = structure(c(1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L), .Label = c("A", "B", "C"), class = "factor"),
percentage = c(56.99, 22.25, 20.77, 59.22, 18.37, 22.41,
69.55, 15.06, 15.39)), .Names = c("views", "date", "article",
"percentage"), class = "data.frame", row.names = c(NA, -9L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.