简体   繁体   中英

dplyr / tidyr summaries two columns into a single named list column

Imagine this data frame:

df <- tibble(
  key = c(rep(1, 3), rep(2, 3), rep(3, 3)),
  date = rep(Sys.Date(), 9),
  hour = rep(c('00', '01', '02'), 3),
  value = rep(c(8, 9, 10), 3)
  )

I want output such that the group summary column is a named list of hour and value. Same as if I were to do this, for each group:

as.list(setNames(df$value[df$key == 1], df$hour[df$key == 1]))
$`00`
[1] 8

$`01`
[1] 9

$`02`
[1] 10

Something along these lines, but something that actually works:

df %>%
  group_by(key, date) %>%
  summarise(
    daily_value = sum(value),
    hourly_values = as.list(setNames(value, hour))
    )

Open to a nest or similar tidyr solution as well.

EDIT: Output should be same as what is produced here:

outputDf <- df %>%
  group_by(key, date) %>%
  summarise(daily_value = sum(value))

outputDf$hourly_value <- list(
  as.list(setNames(df$value[df$key == 1], df$hour[df$key == 1])),
  as.list(setNames(df$value[df$key == 2], df$hour[df$key == 2])),
  as.list(setNames(df$value[df$key == 3], df$hour[df$key == 3]))
  )

outputDf
# A tibble: 3 x 4
# Groups:   key [?]
    key       date daily_value hourly_value
  <dbl>     <date>       <dbl>       <list>
1     1 2019-06-18          27   <list [3]>
2     2 2019-06-18          27   <list [3]>
3     3 2019-06-18          27   <list [3]>

outputDf$hourly_value
[[1]]
[[1]]$`00`
[1] 8

[[1]]$`01`
[1] 9

[[1]]$`02`
[1] 10


[[2]]
[[2]]$`00`
[1] 8

[[2]]$`01`
[1] 9

[[2]]$`02`
[1] 10


[[3]]
[[3]]$`00`
[1] 8

[[3]]$`01`
[1] 9

[[3]]$`02`
[1] 10

We need to wrap with a list as summarise expects to return a single row per group. With as.list , it would be a list with length same as the number of rows of the group. By wrapping it as a list , we make sure that the length is 1 for summarise

library(dplyr)  
df %>% 
   group_by(key, date) %>% 
   summarise(daily_value = sum(value), 
              hourly_values = list(as.list(setNames(value, hour))))
df <- tibble(
  key = c(rep(1, 3), rep(2, 3), rep(3, 3)),
  date = rep(Sys.Date(), 9),
  hour = rep(c('00', '01', '02'), 3),
  value = rep(c(8, 9, 10), 3)
)

df2 <- df %>% 
  group_by(key, date) %>% 
  mutate(daily_value = sum(value),
  hourly_value = as.list(value)) #create a list column

names(df2$hourly_value) <- df$hour #give names to the list column

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM