添加列以從R中的數據框進行聚合

Question

我在數據幀中有一些數據，它在數據幀df中看起來像這樣（頭）：

  site year       date  value
1  MLO 1969 1969-08-20 323.95
2  MLO 1969 1969-08-27 324.58
3  MLO 1969 1969-09-02 321.61
4  MLO 1969 1969-09-12 321.15
5  MLO 1969 1969-09-24 321.15
6  MLO 1969 1969-10-03 320.54

我正在使用aggregate（）按年份查找最大值：

ag <- aggregate(df$value ~ df$year, data=df, max)

這很好用，而我在ag中有以下內容（頭）：

       df$year      df$value
1         1969        324.58
2         1970        331.16
3         1971        325.89
4         1974        336.75
5         1976        333.87
6         1977        338.63

但是，我想繪制原始數據，然后在聚合中的數據上分層，為此，我需要在聚合中有一列具有完整日期字段（與最大值匹配的字段）的列。 換句話說，我需要聚合中的每個向量看起來像：

          df$date df$year  df$value
1      1969-08-27    1969    324.58

依此類推，所以我可以像這樣geom_point：

sp <- ggplot(df, aes(x=date, y=value)) +
  labs(x="Year", y="Value") 
sp + geom_point(colour="grey60", size=1) +
     geom_point(data=ag, aes(x=`df$date`, 
                             y=`df$value`))

聚集可能嗎？ 也就是說，我可以使用年份來計算最大合計值，然后將其添加到數據框中匹配行的日期字段中嗎？

謝謝！！

Answer 1

使用dplyr解決方案並組成數據

library(dplyr)
df <- data.frame(year = c(1969, 1969, 1969, 1970, 1970), date = c("1969-08-20", "1969-08-21", "1969-08-22", "1970-08-20", "1969-08-21"), 
                 value = c(1,3,2, 10, 8))

df %>% group_by(year) %>% summarise(max_val = max(value),
                                    max_date = date[which.max(value)])
# A tibble: 2 x 3
   year max_val max_date  
  <dbl>   <dbl> <chr>     
1 1969.      3. 1969-08-21
2 1970.     10. 1970-08-20

Answer 2

概觀

您可以使用base :: merge（）來分配df$date其value通過inner-join在df和agg共享。 為了不獲取df所有變量，我將其限制為僅包括date和value列。

# load data
df <-
  read.table(
    text = "site year       date  value
      MLO 1969 1969-08-20 323.95
      MLO 1969 1969-08-27 324.58
      MLO 1969 1969-09-02 321.61
      MLO 1969 1969-09-12 321.15
      MLO 1969 1969-09-24 321.15
      MLO 1969 1969-10-03 320.54"
    , header = TRUE
    , stringsAsFactors = FALSE )

# calculate max value by year
ag <- aggregate( formula = value ~ year, data = df, FUN = max )

# grab the date from df that matches
# the value from agg
ag <-
  merge( x = ag
         , y = df[c("date", "value")]
         , by = "value"
         , all = FALSE ) # to indicate that an inner-join be performed

# view results
ag
# value year       date
# 1 324.58 1969 1969-08-27

# end of script #

Answer 3

你可以使用dplyr::mutate ，而不是aggregate與由今年最高值創建新列。 然后，您可以將單獨的幾何映射到原始變量和新列。 我將使用彩色線表示聚合。

使用2年的示例數據：

df1 <- structure(list(site = c("MLO", "MLO", "MLO", "MLO", "MLO", "MLO"),
                      year = c(1970, 1970, 1970, 1969, 1969, 1969),
                      date = c("1970-08-20", "1970-08-27", "1970-09-02",
                               "1969-09-12", "1969-09-24", "1969-10-03"),
                      value = c(323.95, 324.58, 321.61, 321.15, 321.15, 320.54)),
                      class = "data.frame",
                      .Names = c("site", "year", "date", "value"), 
                      row.names = c(NA, -6L))

library(tidyverse)
df1 %>% 
  group_by(year) %>% 
  mutate(maxVal = max(value)) %>% 
  ungroup() %>% 
  ggplot() + 
    geom_point(aes(date, value)) + 
    geom_line(aes(date, maxVal, group = year), color = "red")

也可以使用stat_summary一個聰明的方法。

添加列以從R中的數據框進行聚合

問題描述

3 個解決方案

解決方案1
2 已采納 2018-04-11 00:11:40

解決方案2
1 2018-04-11 00:15:09

概觀

解決方案3
0 2018-04-11 00:27:13

添加列以從R中的數據框進行聚合

問題描述

3 個解決方案

解決方案1 2 已采納 2018-04-11 00:11:40

解決方案2 1 2018-04-11 00:15:09

概觀

解決方案3 0 2018-04-11 00:27:13

解決方案1
2 已采納 2018-04-11 00:11:40

解決方案2
1 2018-04-11 00:15:09

解決方案3
0 2018-04-11 00:27:13