[英]Calculate Percentage for each time series observations per Group in R
R的新手,所以我開始着手處理數據問題。 試圖尋找類似的問題,但找不到。
我想添加一個額外的列,該列是文章組之間每天的視圖拆分百分比。 下面的示例數據集
views date article
1578 2015-01-01 A
616 2015-01-01 B
575 2015-01-01 C
1744 2015-01-02 A
541 2015-01-02 B
660 2015-01-02 C
2906 2015-01-03 A
629 2015-01-03 B
643 2015-01-03 C
和我想要的預期結果。
views percentage date article
1578 56.99 2015-01-01 A
616 22.25 2015-01-01 B
575 20.77 2015-01-01 C
1744 59.22 2015-01-02 A
541 18.37 2015-01-02 B
660 22.41 2015-01-02 C
2906 69.55 2015-01-03 A
629 15.06 2015-01-03 B
643 15.39 2015-01-03 C
我知道這可以通過使用子集分割日期框架來實現,但是我希望使用庫可以有更簡潔的方法嗎?
謝謝 !
library(dplyr)
df %>% group_by(date) %>% mutate( percentage = views/sum(views))
Source: local data frame [9 x 4]
Groups: date
views date article percentage
1 1578 2015-01-01 A 0.5698808
2 616 2015-01-01 B 0.2224630
3 575 2015-01-01 C 0.2076562
4 1744 2015-01-02 A 0.5921902
5 541 2015-01-02 B 0.1837012
6 660 2015-01-02 C 0.2241087
7 2906 2015-01-03 A 0.6955481
8 629 2015-01-03 B 0.1505505
9 643 2015-01-03 C 0.1539014
或者,如果每天可能有多篇相同的文章:
df %>% group_by(date) %>% mutate(sum = sum(views)) %>%
group_by(date, article) %>% mutate(percentage = views/sum) %>%
select(-sum)
如果df
是您的data.frame,則可以執行以下操作:
library(data.table)
setDT(df)[,percentage:=signif(100*views/sum(views),4),by=date][]
# views date article percentage
#1: 1578 2015-01-01 A 56.99
#2: 616 2015-01-01 B 22.25
#3: 575 2015-01-01 C 20.77
#4: 1744 2015-01-02 A 59.22
#5: 541 2015-01-02 B 18.37
#6: 660 2015-01-02 C 22.41
#7: 2906 2015-01-03 A 69.55
#8: 629 2015-01-03 B 15.06
#9: 643 2015-01-03 C 15.39
或基數R
:
df$percentage = signif(100*with(df, ave(views, date, FUN=function(x) x/sum(x))),4)
數據:
df = structure(list(views = c(1578L, 616L, 575L, 1744L, 541L, 660L,
2906L, 629L, 643L), date = structure(c(1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L), .Label = c("2015-01-01", "2015-01-02", "2015-01-03"
), class = "factor"), article = structure(c(1L, 2L, 3L, 1L, 2L,
3L, 1L, 2L, 3L), .Label = c("A", "B", "C"), class = "factor"),
percentage = c(56.99, 22.25, 20.77, 59.22, 18.37, 22.41,
69.55, 15.06, 15.39)), .Names = c("views", "date", "article",
"percentage"), class = "data.frame", row.names = c(NA, -9L))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.