![](/img/trans.png)
[英]How do I create a column based on values of another using dplyr without having to write down every value?
[英]Correcting the value of a column based on another value with dplyr, for every ID
我有一個包含person_ID,Job_ID,Municipality_code和其他變量的數據框(請參見下面的示例數據框)。 Job_ID變量是按月測量的,而Municipality_code是按每年測量的。
as.data.frame(df)
Person_ID Month Year Job_ID Municipality_code
1 1 1 2017 Job1 1
2 1 2 2017 Job1 1
3 1 3 2017 Job1 1
4 1 4 2017 Job1 1
5 1 5 2017 Job2 1
6 1 6 2017 Job2 1
7 1 7 2017 Job2 1
8 1 8 2017 Job2 1
9 1 9 2017 Job2 1
10 1 10 2017 Job2 1
11 1 11 2017 Job2 1
12 1 12 2017 Job2 1
13 1 1 2018 Job2 20
14 1 2 2018 Job2 20
15 1 3 2018 Job2 20
16 1 4 2018 Job2 20
17 1 5 2018 Job2 20
18 1 6 2018 Job2 20
19 1 7 2018 Job2 20
20 1 8 2018 Job2 20
21 1 9 2018 Job2 20
22 1 10 2018 Job2 20
23 1 11 2018 Job2 20
24 1 12 2018 Job2 20
我想根據每個Job_ID修改每個Person_ID的Municipality_code。 例如:我們注意到Person_ID 1在2017年第五個月(Job1-> Job2)切換作業。 由於Municipality_code
的屬性,該代碼將保持為1(因為在1-2017,我們擁有Job1和相應的Municipality_code
1)。 我需要一段用於糾正Municipality_code
的代碼(因此,從5/2017開始,我們需要Municipality_code
20而不是1)。 我嘗試了下面的代碼,但是我的努力是徒勞的。
df2 <- df %>%
group_by(Person_ID) %>%
dplyr::mutate(lag = lag(Job_ID, default = NA, order_by = Job_ID),
Municipality_corrected = if_else(Job_ID == lag, Municipality_code[1], Municipality_code[2]))
和所需的輸出...
Person_ID Month Year Job_ID Municipality_code lag Municipality_corrected
1 1 1 2017 Job1 1 <NA> NA
2 1 2 2017 Job1 1 Job1 1
3 1 3 2017 Job1 1 Job1 1
4 1 4 2017 Job1 1 Job1 1
5 1 5 2017 Job2 1 Job1 1
6 1 6 2017 Job2 1 Job2 20
7 1 7 2017 Job2 1 Job2 20
8 1 8 2017 Job2 1 Job2 20
9 1 9 2017 Job2 1 Job2 20
10 1 10 2017 Job2 1 Job2 20
11 1 11 2017 Job2 1 Job2 20
12 1 12 2017 Job2 1 Job2 20
13 1 1 2018 Job2 20 Job2 20
14 1 2 2018 Job2 20 Job2 20
15 1 3 2018 Job2 20 Job2 20
16 1 4 2018 Job2 20 Job2 20
17 1 5 2018 Job2 20 Job2 20
18 1 6 2018 Job2 20 Job2 20
19 1 7 2018 Job2 20 Job2 20
20 1 8 2018 Job2 20 Job2 20
21 1 9 2018 Job2 20 Job2 20
22 1 10 2018 Job2 20 Job2 20
23 1 11 2018 Job2 20 Job2 20
24 1 12 2018 Job2 20 Job2 20
以下為您提供了更正的Municipality_code
df %>%
group_by(Person_ID, Job_ID) %>%
mutate(Municipality_corrected = last(Municipality_code))
# A tibble: 24 x 6
# Groups: Person_ID, Job_ID [2]
# Person_ID Month Year Job_ID Municipality_code Municipality_corrected
# <int> <int> <int> <chr> <int> <int>
# 1 1 1 2017 Job1 1 1
# 2 1 2 2017 Job1 1 1
# 3 1 3 2017 Job1 1 1
# 4 1 4 2017 Job1 1 1
# 5 1 5 2017 Job2 1 20
# 6 1 6 2017 Job2 1 20
# 7 1 7 2017 Job2 1 20
# 8 1 8 2017 Job2 1 20
# 9 1 9 2017 Job2 1 20
# 10 1 10 2017 Job2 1 20
# ... with 14 more rows
我使用的想法是,每個工作的城市代碼是相同的,因此按Job_ID
。 然后,我將每個Job_ID
的最后一個Municipality_code
作為更正的代碼。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.