簡體   English   中英

對於每個ID,使用dplyr根據另一個值更正列的值

[英]Correcting the value of a column based on another value with dplyr, for every ID

我有一個包含person_ID,Job_ID,Municipality_code和其他變量的數據框(請參見下面的示例數據框)。 Job_ID變量是按月測量的,而Municipality_code是按每年測量的。

 as.data.frame(df)
   Person_ID Month Year Job_ID Municipality_code
1          1     1 2017   Job1                 1
2          1     2 2017   Job1                 1
3          1     3 2017   Job1                 1
4          1     4 2017   Job1                 1
5          1     5 2017   Job2                 1
6          1     6 2017   Job2                 1
7          1     7 2017   Job2                 1
8          1     8 2017   Job2                 1
9          1     9 2017   Job2                 1
10         1    10 2017   Job2                 1
11         1    11 2017   Job2                 1
12         1    12 2017   Job2                 1
13         1     1 2018   Job2                20
14         1     2 2018   Job2                20
15         1     3 2018   Job2                20
16         1     4 2018   Job2                20
17         1     5 2018   Job2                20
18         1     6 2018   Job2                20
19         1     7 2018   Job2                20
20         1     8 2018   Job2                20
21         1     9 2018   Job2                20
22         1    10 2018   Job2                20
23         1    11 2018   Job2                20
24         1    12 2018   Job2                20

我想根據每個Job_ID修改每個Person_ID的Municipality_code。 例如:我們注意到Person_ID 1在2017年第五個月(Job1-> Job2)切換作業。 由於Municipality_code的屬性,該代碼將保持為1(因為在1-2017,我們擁有Job1和相應的Municipality_code 1)。 我需要一段用於糾正Municipality_code的代碼(因此,從5/2017開始,我們需要Municipality_code 20而不是1)。 我嘗試了下面的代碼,但是我的努力是徒勞的。

df2 <- df %>% 
  group_by(Person_ID) %>%
  dplyr::mutate(lag = lag(Job_ID, default = NA, order_by = Job_ID), 
                Municipality_corrected = if_else(Job_ID == lag, Municipality_code[1], Municipality_code[2]))

和所需的輸出...

Person_ID Month Year Job_ID Municipality_code  lag Municipality_corrected
1          1     1 2017   Job1                 1 <NA>                     NA
2          1     2 2017   Job1                 1 Job1                      1
3          1     3 2017   Job1                 1 Job1                      1
4          1     4 2017   Job1                 1 Job1                      1
5          1     5 2017   Job2                 1 Job1                      1
6          1     6 2017   Job2                 1 Job2                      20
7          1     7 2017   Job2                 1 Job2                      20
8          1     8 2017   Job2                 1 Job2                      20
9          1     9 2017   Job2                 1 Job2                      20
10         1    10 2017   Job2                 1 Job2                      20
11         1    11 2017   Job2                 1 Job2                      20
12         1    12 2017   Job2                 1 Job2                      20
13         1     1 2018   Job2                 20 Job2                     20
14         1     2 2018   Job2                 20 Job2                     20
15         1     3 2018   Job2                 20 Job2                     20
16         1     4 2018   Job2                 20 Job2                     20
17         1     5 2018   Job2                 20 Job2                     20
18         1     6 2018   Job2                 20 Job2                     20
19         1     7 2018   Job2                 20 Job2                     20
20         1     8 2018   Job2                 20 Job2                     20
21         1     9 2018   Job2                 20 Job2                     20
22         1    10 2018   Job2                 20 Job2                     20
23         1    11 2018   Job2                 20 Job2                     20
24         1    12 2018   Job2                 20 Job2                     20

以下為您提供了更正的Municipality_code

df %>% 
  group_by(Person_ID, Job_ID) %>% 
  mutate(Municipality_corrected = last(Municipality_code))

# A tibble: 24 x 6
# Groups:   Person_ID, Job_ID [2]
#    Person_ID Month  Year Job_ID Municipality_code Municipality_corrected
#        <int> <int> <int> <chr>              <int>                  <int>
#  1         1     1  2017 Job1                   1                      1
#  2         1     2  2017 Job1                   1                      1
#  3         1     3  2017 Job1                   1                      1
#  4         1     4  2017 Job1                   1                      1
#  5         1     5  2017 Job2                   1                     20
#  6         1     6  2017 Job2                   1                     20
#  7         1     7  2017 Job2                   1                     20
#  8         1     8  2017 Job2                   1                     20
#  9         1     9  2017 Job2                   1                     20
# 10         1    10  2017 Job2                   1                     20
# ... with 14 more rows

我使用的想法是,每個工作的城市代碼是相同的,因此按Job_ID 然后,我將每個Job_ID的最后一個Municipality_code作為更正的代碼。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM