简体   繁体   English

groupby + row与dplyr的平均值

[英]groupby + rowMean with dplyr

I have a large data frame, which includes impoT and nlc as key (ignore t all together), and other columns which each hold a number. 我有一个大的数据帧,其包括impoTnlc作为密钥(忽略t一起),和其他列,其中每个保持一个数字。 I want to find, for each impoT and nlc pair, the average of all the other columns, or basically a rowMean. 我想找到,每个impoTnlc对所有其他列的平均值,或者基本上rowMean。 A subset of my data, which only includes one nlc, is given at the end. 最后给出了我的数据子集,其中仅包含一个nlc。 The last thing I tried was : 我尝试的最后一件事是:

avg <- data.frame(a %>% group_by(impoT, nlc) %>% select(-c(1:3)) %>% mutate(r= rowMeans(.) ))
stds = (a %>% group_by(impoT, nlc) %>% select(-c(1:3)) %>% apply( 1, sd)) #wrong

dput(a)
    structure(list(impoT = 1:18, nlc = c(669L, 669L, 669L, 669L, 
669L, 669L, 669L, 669L, 669L, 669L, 669L, 669L, 669L, 669L, 669L, 
669L, 669L, 669L), t = c(102L, 118L, 134L, 150L, 166L, 182L, 
198L, 214L, 230L, 246L, 262L, 278L, 294L, 310L, 326L, 342L, 358L, 
374L), X11950 = c(6, 14, 40, 53, 59, 70, 118, 119, 111, 114, 
103, 220, 278, 94, 28, 13, 5, 8), X11951 = c(4, 18, 41, 64, 78, 
87, 140, 112, 113, 129, 112, 245, 322, 102, 52, 20, 15, 7), X11952 = c(8, 
13, 30, 42, 52, 86, 126, 118, 52, 87, 116, 251, 262, 101, 35, 
21, 15, 21), X11955 = c(9, 11, 47, 38, 39, 70, 95, 82, 80, 77, 
77, 142, 192, 78, 13, 13, 5, 0), X11956 = c(14, 13, 44, 65, 65, 
72, 125, 138, 117, 111, 104, 175, 282, 93, 28, 14, 8, 4), X11957 = c(10, 
7, 45, 42, 50, 83, 123, 102, 104, 82, 102, 234, 265, 101, 23, 
13, 7, 6), X11958 = c(10, 13, 42, 60, 68, 69, 106, 125, 104, 
103, 112, 233, 310, 128, 50, 22, 10, 5), X11959 = c(7, 11, 32, 
45, 63, 74, 119, 87, 121, 108, 104, 229, 266, 111, 46, 26, 22, 
11), X11962 = c(8, 12, 38, 35, 49, 58, 96, 66, 73, 109, 82, 161, 
192, 75, 22, 4, 2, 3), X11963 = c(8, 9, 39, 40, 56, 50, 142, 
98, 102, 78, 79, 220, 229, 87, 25, 5, 7, 2), X11964 = c(10, 9, 
42, 60, 53, 52, 105, 114, 96, 94, 95, 180, 268, 114, 23, 10, 
7, 10), X11965 = c(9, 9, 41, 40, 61, 81, 150, 102, 102, 121, 
125, 222, 347, 116, 37, 18, 3, 4), X11966 = c(10, 9, 34, 43, 
49, 73, 112, 123, 102, 92, 107, 207, 239, 115, 60, 18, 15, 5), 
    X11969 = c(8, 9, 31, 34, 41, 51, 93, 92, 68, 103, 76, 166, 
    182, 63, 24, 14, 6, 4), X11970 = c(7, 12, 33, 48, 56, 59, 
    102, 88, 99, 86, 103, 194, 233, 90, 25, 13, 7, 3), X11971 = c(9, 
    16, 37, 60, 78, 62, 114, 106, 129, 107, 91, 212, 272, 88, 
    31, 10, 3, 3), X12088 = c(6, 11, 41, 44, 56, 70, 106, 97, 
    64, 73, 75, 161, 186, 76, 17, 8, 2, 2), X12089 = c(0, 11, 
    53, 59, 62, 64, 114, 109, 109, 100, 66, 222, 241, 88, 19, 
    8, 8, 3), X12090 = c(4, 12, 57, 52, 65, 73, 132, 109, 120, 
    101, 104, 227, 238, 99, 17, 8, 10, 8), X12091 = c(4, 16, 
    54, 167, 74, 62, 111, 95, 120, 102, 92, 227, 317, 106, 44, 
    16, 10, 4), X12092 = c(9, 10, 50, 55, 63, 64, 130, 103, 98, 
    116, 83, 249, 279, 88, 35, 36, 22, 15), X12095 = c(5, 15, 
    39, 44, 53, 58, 95, 92, 67, 63, 69, 163, 182, 69, 20, 8, 
    4, 2), X12096 = c(3, 14, 49, 53, 71, 70, 107, 130, 90, 89, 
    101, 214, 253, 100, 30, 10, 3, 3), X12097 = c(2, 16, 53, 
    61, 82, 83, 123, 124, 125, 98, 89, 220, 274, 107, 20, 17, 
    7, 5), X12098 = c(6, 17, 56, 59, 51, 77, 102, 115, 93, 98, 
    83, 221, 288, 97, 36, 16, 9, 10), X12099 = c(2, 16, 39, 49, 
    60, 84, 112, 91, 102, 103, 108, 246, 261, 131, 49, 24, 18, 
    14), X12102 = c(4, 12, 29, 47, 64, 69, 104, 111, 92, 72, 
    105, 174, 179, 64, 16, 10, 2, 1)), .Names = c("impoT", "nlc", 
"t", "X11950", "X11951", "X11952", "X11955", "X11956", "X11957", 
"X11958", "X11959", "X11962", "X11963", "X11964", "X11965", "X11966", 
"X11969", "X11970", "X11971", "X12088", "X12089", "X12090", "X12091", 
"X12092", "X12095", "X12096", "X12097", "X12098", "X12099", "X12102"
), row.names = c(NA, -18L), class = "data.frame")

It's easiest if you split your means into two steps, as you're actually taking the mean of irregular groups: first each row, and second each group. 如果将均值分为两步,这是最简单的,因为您实际上是在对不规则组进行均值:第一行,第二行。 This means you're taking the means of means, but given each of the row means is of the same amount of numbers, they should be fine that way, although you should consider that the grouping means may be means of different amounts of rows. 这意味着你采取的方式手段,但鉴于各行的意思是相同量的数字,他们应该被罚款这种方式,虽然你应该考虑的是,分组装置可为不同量的行的方式。

You also need to nest the select so you don't lose your grouping variables, and use summarise to collapse the groups. 您还需要嵌套select以免丢失分组变量,并使用summarise折叠组。 All told, 总而言之,

a %>% mutate(r = rowMeans(select(a, -c(1:3)))) %>% 
    group_by(impoT, nlc) %>% summarise(r = mean(r))

produces 产生

Source: local data frame [18 x 3]
Groups: impoT [?]

   impoT   nlc          r
   (int) (int)      (dbl)
1      1   669   6.740741
2      2   669  12.407407
3      3   669  42.074074
4      4   669  54.037037
5      5   669  59.925926
6      6   669  69.296296
7      7   669 114.888889
8      8   669 105.481481
9      9   669  98.259259
10    10   669  96.888889
11    11   669  94.925926
12    12   669 207.962963
13    13   669 253.222222
14    14   669  95.592593
15    15   669  30.555556
16    16   669  14.629630
17    17   669   8.592593
18    18   669   6.037037

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM