简体   繁体   English

变异多列 tidyverse

[英]Mutating multiple columns tidyverse

I would like to perform calculations on multiple columns using tidyverse.我想使用 tidyverse 对多列执行计算。 I know how to do it for a single user (represented in a single column), but I need to do it for 1000+ users (and thus an equal amount of columns).我知道如何为单个用户执行此操作(在单个列中表示),但我需要为 1000 多个用户执行此操作(因此列数相等)。

However, I'm not that well acquainted with using tidyverse and calculating with tibbles, but I've had some earlier help on this platform (the exact coding differs from the one below, but I brought it down to the core issue).但是,我不太熟悉使用 tidyverse 和使用 tibbles 进行计算,但我在这个平台上得到了一些早期的帮助(确切的编码与下面的不同,但我把它归结为核心问题)。

The dataset contains all hours of a year (8760 values, 365 days with each 24 hours) accompanied by values for multiple users.该数据集包含一年中的所有时间(8760 个值,365 天,每 24 小时)以及多个用户的值。

Per user, I need to aggregate the positive values between a specific timeframe (eg everything between 00:00 and 03:00), subtract those from the aggregate values between 03:00 and 05:00 (regardless whether these values are positive of negative).对于每个用户,我需要汇总特定时间范围之间的正值(例如 00:00 和 03:00 之间的所有值),从 03:00 和 05:00 之间的汇总值中减去这些值(无论这些值是正值还是负值)。 In total there are 1000+ users.总共有1000多个用户。

library(tidyverse)
library(lubridate)
set.seed(4)
time_index <- seq(
  from = as.POSIXct("2016-01-01 00:00"),
  to  = as.POSIXct("2016-12-31 23:00"),
  by = "hour"
)    
user1 <- runif(length(time_index), min = -1, max = 1)
user2 <- runif(length(time_index), min = -1, max = 1)
user3 <- runif(length(time_index), min = -1, max = 1)
example <- data.frame(time_index, user1, user2, user3)

The code for a single column(user) is:单个列(用户)的代码是:

df_intermediate <- example %>%

  mutate(
    date = as_date(time_index),
    hour = hour(time_index),
    hour_block = case_when(
      between(hour, 0, 2) ~ "block_1",
      between(hour, 3, 5) ~ "block_2",
      TRUE ~ NA_character_
    )
  ) %>% 

  filter(!is.na(hour_block)) %>% 
  group_by(date, hour_block) %>%
  nest() %>% 
  ungroup() %>%
  mutate(
    intermediate_result = if_else(                              
      hour_block == "block_1",                                  
      map_dbl(data, ~ sum(.$user[.$user> 0 ])),
      map_dbl(data, ~ sum(.$user))
    )
  ) %>% 

  group_by(date) %>%
  summarise(
    final_result = first(intermediate_result) - last(intermediate_result)
  )

This gives the following results for a single user:这为单个用户提供了以下结果:

df_intermediate
#> # A tibble: 366 x 2
#>    date       final_result
#>    <date>            <dbl>
#>  1 2016-01-01       0.469 
#>  2 2016-01-02       0.189 
#>  3 2016-01-03      -1.32  

I have not been able to scale it up to multiple users.我无法将其扩展到多个用户。 I looked at using mutate_at or writing an own function to include in the mutate_at, but I do not know how to include the condition (there should only be positive values in the "first_block"), and the multitude of columns.我查看了使用 mutate_at 或编写自己的 function 以包含在 mutate_at 中,但我不知道如何包含条件(“first_block”中应该只有正值)和众多列。 So how could this be mutated for multiple columns instead of just a single one?那么,如何才能对多列而不是单列进行变异呢?

This is one way of doing it, matches your partial results.这是一种方法,与您的部分结果相匹配。 The steps can of course be chained together to avoid intermediate data frames.这些步骤当然可以链接在一起以避免中间数据帧。

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
set.seed(4)
time_index <- seq(
    from = as.POSIXct("2016-01-01 00:00"),
    to  = as.POSIXct("2016-12-31 23:00"),
    by = "hour"
)    
user1 <- runif(length(time_index), min = -1, max = 1)
user2 <- runif(length(time_index), min = -1, max = 1)
user3 <- runif(length(time_index), min = -1, max = 1)
example <- data.frame(time_index, user1, user2, user3)

step1 <- example %>%
    mutate(
        date = as_date(time_index),
        hour = hour(time_index),
        hour_block = case_when(
            between(hour, 0, 2) ~ "block_1",
            between(hour, 3, 5) ~ "block_2",
            TRUE ~ NA_character_
        )
    )


step2 <- step1 %>% 
    filter(!is.na(hour_block)) %>% 
    pivot_longer(cols = starts_with("user"), names_to = "user_id") %>% 
    group_by(date, user_id) %>% 
    summarise(bl1_calc = sum(value[value>0 & hour_block == "block_1"]),
                 bl2_calc = sum(value[hour_block == "block_2"]),
                 final_result = bl1_calc - bl2_calc) %>% 
    select(-starts_with("bl"))

step3 <- step2 %>% 
    pivot_wider(names_from = user_id, values_from = final_result)


step3
#> # A tibble: 366 x 4
#> # Groups:   date [366]
#>    date         user1  user2  user3
#>    <date>       <dbl>  <dbl>  <dbl>
#>  1 2016-01-01  0.469   2.25   0.662
#>  2 2016-01-02  0.189   0.345  4.33 
#>  3 2016-01-03 -1.32    0.375  0.931
#>  4 2016-01-04  0.746   1.21   2.05 
#>  5 2016-01-05  0.362   1.42  -0.578
#>  6 2016-01-06  1.55   -1.12   1.79 
#>  7 2016-01-07 -1.22    1.07  -0.896
#>  8 2016-01-08  0.873   1.41  -0.640
#>  9 2016-01-09 -0.0262  1.85   0.930
#> 10 2016-01-10 -0.953   0.666  0.624
#> # … with 356 more rows

Created on 2020-05-20 by the reprex package (v0.3.0)由 reprex package (v0.3.0) 于 2020 年 5 月 20 日创建

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM