繁体   English   中英

使用dplyr对多个变量求和

[英]Summing Multiple Variables using dplyr

我正在尝试对数据集中多个主题的多个变量求和。 我知道如何使用plyr软件包执行此操作; 但是,由于数据集的长度,变量的数量以及我尝试做的不同滚动总和的数量(2天,3天,4天等),因此。 我想知道是否有人用更省时的方式在dplyr中完成此任务。

我的数据与此类似:

Subjects <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)
Day <- c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5)
variable.A <- rnorm(n = Day, mean = 20, sd = 5)
variable.B <- rnorm(n = Day, mean = 50, sd = 15)
variable.C <- rnorm(n = Day, mean = 100, sd = 33)
dat <- data.frame(Subjects, Day, variable.A, variable.B, variable.C)
dat



   Subjects Day variable.A variable.B variable.C
1         1   1   20.17676   72.44022   56.69915
2         1   2   14.11462   46.28473  117.00864
3         1   3   15.30440   72.43752   93.17489
4         1   4   13.72422   66.76744  101.26422
5         1   5   21.97695   69.50480  102.61979
6         2   1   14.45742   32.69106   82.37268
7         2   2   33.37783   65.06782   97.17744
8         2   3   13.57833   26.37183   89.38218
9         2   4   23.01717   55.83446  147.85362
10        2   5   14.06008   32.00396   48.73060
11        3   1   14.57199   60.29746   87.07977
12        3   2   15.77413   77.04517  132.17910
13        3   3   30.05661   30.62220  171.35998
14        3   4   24.65348   53.96450   74.99875
15        3   5   26.93699   57.06393   36.81901

我尝试过的代码示例如下:

library(plyr)
library(RcppRoll)
summarize <- ddply(dat, "Subjects", mutate,
    Two.Day.Roll.A = roll_sum(variable.A, 2, align = "right", fill = NA),
    Two.Day.Roll.B = roll_sum(variable.B, 2, align = "right", fill = NA),
    Two.Day.Roll.C = roll_sum(variable.C, 2, align = "right", fill = NA))

   Subjects Day variable.A variable.B variable.C Two.Day.Roll.A Two.Day.Roll.B Two.Day.Roll.C
1         1   1  15.324798   24.83074  137.48853             NA             NA             NA
2         1   2  12.112943   58.86094   86.87454       27.43774       83.69168       224.3631
3         1   3  16.179328   57.95450   68.71333       28.29227      116.81544       155.5879
4         1   4  15.319750   38.13721   79.43194       31.49908       96.09171       148.1453
5         1   5  21.791452   61.99368  134.30205       37.11120      100.13089       213.7340
6         2   1  10.937461   63.83164   95.04865             NA             NA             NA
7         2   2  14.642376   79.12452  107.13699       25.57984      142.95616       202.1856
8         2   3  17.519905   52.75490  100.62811       32.16228      131.87942       207.7651
9         2   4  23.190371   37.56950  179.72763       40.71028       90.32440       280.3557
10        2   5  13.729350   46.95616   72.14179       36.91972       84.52566       251.8694
11        3   1   9.609171   74.51140  130.90005             NA             NA             NA
12        3   2  27.542897   14.36222  133.87630       37.15207       88.87363       264.7763
13        3   3  18.750015   60.46183  130.44314       46.29291       74.82405       264.3194
14        3   4  17.461882   52.65797  176.30620       36.21190      113.11979       306.7493
15        3   5  31.244564   62.41614   78.82916       48.70645      115.07411       255.1354

这已经足够好了,但是,正如我说的那样,原始数据有更多列,并且我想继续对所有这些变量进行3天总和,4天总和等操作。 另外,我的原始数据中包含一些NA,所以也许有办法解决这个问题?

我一直在尝试将mutate_each()函数与dplyr包一起使用,但似乎无法正确使用语法。

谢谢。

这是dplyr版本:

library(dplyr)
library(RcppRoll)
dat %>% group_by(Subjects) %>% 
        mutate_each(funs(roll_sum(., 2, align = "right", fill=NA)), -Subjects, -Day)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM