如何使用 R 中现有列中前一行的值创建新列

Question

I want to create a new column, that consists of the last value from a previous period for the same ID, placed in the same row as the first value for the next period.我想创建一个新列，其中包含同一 ID 上一时期的最后一个值，与下一个时期的第一个值放在同一行。 If there is no previous period NA should be applied.如果没有上一期，则应应用 NA。

However, I can't find any functions in any packages to solve this issue for me, so I expect I have to write a loop?但是，我在任何包中都找不到任何函数来为我解决这个问题，所以我希望我必须编写一个循环？

Does anyone out there have any idea how to solve this in a tidy manner (with or without a loop), that can be applied to a big tibble (+4 million observations)?有没有人知道如何以整洁的方式（有或没有循环）解决这个问题，这可以应用于一个大的 tibble（+400 万个观察）？

My data is ordered like the following df, and the goal is df1:我的数据按如下 df 排序，目标是 df1：

df <- tibble(
  ID = rep(c(77,88,99),each=6),
  PERIOD = rep(c(1,2,3,1,2,3,1,2,3),each=2),
  DATE = seq(as.Date("2020-06-01"), as.Date("2020-06-18"), by= "days"),
  RESULT = seq(from = 10, to = 44, by = 2)
)
df
# A tibble: 18 x 4
      ID PERIOD DATE       RESULT
   <dbl>  <dbl> <date>      <dbl>
 1    77      1 2020-06-01     10
 2    77      1 2020-06-02     12
 3    77      2 2020-06-03     14
 4    77      2 2020-06-04     16
 5    77      3 2020-06-05     18
 6    77      3 2020-06-06     20
 7    88      1 2020-06-07     22
 8    88      1 2020-06-08     24
 9    88      2 2020-06-09     26
10    88      2 2020-06-10     28
11    88      3 2020-06-11     30
12    88      3 2020-06-12     32
13    99      1 2020-06-13     34
14    99      1 2020-06-14     36
15    99      2 2020-06-15     38
16    99      2 2020-06-16     40
17    99      3 2020-06-17     42
18    99      3 2020-06-18     44

df1 <- tibble(
  ID = rep(c(77,88,99),each=6),
  PERIOD = rep(c(1,2,3,1,2,3,1,2,3),each=2),
  DATE = seq(as.Date("2020-06-01"), as.Date("2020-06-18"), by= "days"),
  RESULT = seq(from = 10, to = 44, by = 2),
  RESULT_post = c("NA","NA",12,"NA",16,"NA","NA","NA",24,"NA",28, 
                  "NA","NA", "NA",36, "NA",40, "NA" )
)
df1

# A tibble: 18 x 5
      ID PERIOD DATE       RESULT RESULT_pre
   <dbl>  <dbl> <date>      <dbl> <chr>     
 1    77      1 2020-06-01     10 NA        
 2    77      1 2020-06-02     12 NA        
 3    77      2 2020-06-03     14 12        
 4    77      2 2020-06-04     16 NA        
 5    77      3 2020-06-05     18 16        
 6    77      3 2020-06-06     20 NA        
 7    88      1 2020-06-07     22 NA        
 8    88      1 2020-06-08     24 NA        
 9    88      2 2020-06-09     26 24        
10    88      2 2020-06-10     28 NA        
11    88      3 2020-06-11     30 28        
12    88      3 2020-06-12     32 NA        
13    99      1 2020-06-13     34 NA        
14    99      1 2020-06-14     36 NA        
15    99      2 2020-06-15     38 36        
16    99      2 2020-06-16     40 NA        
17    99      3 2020-06-17     42 40        
18    99      3 2020-06-18     44 NA

All inputs are appreciated感谢所有输入

Thx / Sophia谢 / 索菲亚

Answer 1

Here's a way with dplyr :这是dplyr的一种方法：

library(dplyr)

df %>%
  group_by(ID, PERIOD) %>%
  summarise(RESULT_pre = last(RESULT)) %>%
  mutate(RESULT_pre = lag(RESULT_pre)) %>%
  left_join(df, by = c('ID', 'PERIOD')) %>%
  group_by(ID, PERIOD) %>%
  mutate(RESULT_pre = replace(RESULT_pre, -1, NA)) %>%
  select(-RESULT_pre, RESULT_pre)

#      ID PERIOD DATE       RESULT RESULT_pre
#   <dbl>  <dbl> <date>      <dbl>      <dbl>
# 1    77      1 2020-06-01     10         NA
# 2    77      1 2020-06-02     12         NA
# 3    77      2 2020-06-03     14         12
# 4    77      2 2020-06-04     16         NA
# 5    77      3 2020-06-05     18         16
# 6    77      3 2020-06-06     20         NA
# 7    88      1 2020-06-07     22         NA
# 8    88      1 2020-06-08     24         NA
# 9    88      2 2020-06-09     26         24
#10    88      2 2020-06-10     28         NA
#11    88      3 2020-06-11     30         28
#12    88      3 2020-06-12     32         NA
#13    99      1 2020-06-13     34         NA
#14    99      1 2020-06-14     36         NA
#15    99      2 2020-06-15     38         36
#16    99      2 2020-06-16     40         NA
#17    99      3 2020-06-17     42         40
#18    99      3 2020-06-18     44         NA

The logic here is to summarise last RESULT value for each ID and PERIOD and use lag to shift the value in each ID .这里的逻辑是总结每个ID和PERIOD last RESULT值，并使用lag来移动每个ID的值。 We join this result with the original dataset and keep only first value in each group and replace all other value with NA .我们将此结果与原始数据集连接起来，只保留每组中的第一个值，并用NA替换所有其他值。

Answer 2

You can copy all shifted values and overwrite those not fitting with NA :您可以复制所有移位的值并覆盖那些不适合NA ：

n <- nrow(df)
df$RESULT_pre <- c(NA, df$RESULT[-n])
df$RESULT_pre[c(FALSE, df$ID[-1] != df$ID[-n] |
   df$PERIOD[-1] == df$PERIOD[-n])] <- NA
df
#   ID PERIOD       DATE RESULT RESULT_pre
#1  77      1 2020-06-01     10         NA
#2  77      1 2020-06-02     12         NA
#3  77      2 2020-06-03     14         12
#4  77      2 2020-06-04     16         NA
#5  77      3 2020-06-05     18         16
#6  77      3 2020-06-06     20         NA
#7  88      1 2020-06-07     22         NA
#8  88      1 2020-06-08     24         NA
#9  88      2 2020-06-09     26         24
#10 88      2 2020-06-10     28         NA
#11 88      3 2020-06-11     30         28
#12 88      3 2020-06-12     32         NA
#13 99      1 2020-06-13     34         NA
#14 99      1 2020-06-14     36         NA
#15 99      2 2020-06-15     38         36
#16 99      2 2020-06-16     40         NA
#17 99      3 2020-06-17     42         40
#18 99      3 2020-06-18     44         NA

如何使用 R 中现有列中前一行的值创建新列

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-11-19 10:40:41

解决方案2
1 2020-11-19 10:51:26

如何使用 R 中现有列中前一行的值创建新列

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-11-19 10:40:41

解决方案2 1 2020-11-19 10:51:26

解决方案1
1 已采纳 2020-11-19 10:40:41

解决方案2
1 2020-11-19 10:51:26