R使用滯后重新編碼變量

Question

我有數據，參與者可以在四天的時間內每天獲取多個數據點。 我希望用1-4的值重新編碼每一天。 這可能是我的數據的一個示例子集：

my.df <- read.table(text="
ID Date  Variable
1  0401  9
1  0402  2
1  0403  5
1  0404  8
2  0402  1
2  0402  9
2  0403  0
2  0404  3
2  0405  2
2  0405  1", header=TRUE)

> dput(my.df)
structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), 
    Date = c(401L, 402L, 403L, 404L, 402L, 402L, 403L, 404L, 405L, 
    405L), Variable = c(9L, 2L, 5L, 8L, 1L, 9L, 0L, 3L, 2L, 1L
    )), .Names = c("ID", "Date", "Variable"), class = "data.frame", 
row.names = c(NA, -10L))

這是我想要的輸出：

ID Date  Variable DateRecode 
1  0401  9     1
1  0402  2     2
1  0403  5     3
1  0404  8     4
2  0402  1     1
2  0402  9     1
2  0403  0     2
2  0404  3     3
2  0405  2     4
2  0405  1     4", header=TRUE)

我認為我需要使用一個滯后函數來創建DateRecode列，因為實際數據集中有數十個參與者。

我可以使用dplyr生成滯后列：

library(dplyr)
my.df <- 
  my.df %>%
  group_by(ID) %>%
  mutate(lag.value = dplyr::lag(Date, n = 1, default = NA))

但這當然不會告訴R重新編碼任何內容。

我本質上遵循的邏輯是：當按ID分組時，如果Date的值等於Date的第一個/最低值，則創建一個值為1的新列。對於后續的每一行，如果Date是與上一行相同的值，然后為1，否則為1。

IF語句對此也還不起作用。 我無法找出一種方法來解釋每個參與者的日期都不同於上一個參與者，因此我希望可以使用滯后來解決。

有人對我可能如何做到這一點有任何建議嗎？ 我已經為此抓了幾天頭了。 提前致謝！

Answer 1

我們可以搭配match

library(dplyr)
my.df %>% 
   group_by(ID) %>% 
   mutate(lag.value = match(Date, unique(Date)))
# A tibble: 10 x 4
# Groups:   ID [2]
#      ID  Date Variable lag.value
#   <int> <int>    <int>     <int>
# 1     1   401        9         1
# 2     1   402        2         2
# 3     1   403        5         3
# 4     1   404        8         4
# 5     2   402        1         1
# 6     2   402        9         1
# 7     2   403        0         2
# 8     2   404        3         3
# 9     2   405        2         4
#10     2   405        1         4

或使用factor並將其強制為integer

my.df  %>%
  group_by(ID) %>%
  mutate(lag.value = as.integer(factor(Date)))

或者另一個選擇是group_indices

library(purrr)
my.df %>% 
  split(.$ID) %>%
  map_df(~ .x %>% mutate(lag.value = group_indices(., Date)))
#   ID Date Variable lag.value
#1   1  401        9         1
#2   1  402        2         2
#3   1  403        5         3
#4   1  404        8         4
#5   2  402        1         1
#6   2  402        9         1
#7   2  403        0         2
#8   2  404        3         3
#9   2  405        2         4
#10  2  405        1         4

注意：這里的“日期”是按順序排列的。 如果不是，則進行arrange ，然后進行group_by

my.df %>%
   arrange(ID, Date) %>%
   group_by(ID) %>%
   mutate(lag.value = match(Date, unique(Date)))

Answer 2

在基數R中，您可以執行以下操作：

 transform(my.df,lag.value=ave(Date,ID,FUN=factor))
   ID Date Variable lag.value
1   1  401        9         1
2   1  402        2         2
3   1  403        5         3
4   1  404        8         4
5   2  402        1         1
6   2  402        9         1
7   2  403        0         2
8   2  404        3         3
9   2  405        2         4
10  2  405        1         4

R使用滯后重新編碼變量

問題描述

2 個解決方案

解決方案1
0 已采納 2018-06-08 03:44:19

解決方案2
0 2018-06-08 05:00:02

R使用滯后重新編碼變量

問題描述

2 個解決方案

解決方案1 0 已采納 2018-06-08 03:44:19

解決方案2 0 2018-06-08 05:00:02

解決方案1
0 已采納 2018-06-08 03:44:19

解決方案2
0 2018-06-08 05:00:02