[英]Join multiple columns with multiple lookup tables
我的任務是從 R 中的 SAS 重現一個過程。 在過去的 71 個月中,我有 1 個包含 140 萬行和 156 列的表。 列中只有 ID,這些將由文本替換。
為此,有 60 個查找表。 其中一些被多次使用,一些只被使用一次。
我無法顯示真實數據,但這里是表格外觀的一個小示例。:
df <-tibble(contract_id = c(1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010),
feature_a = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1),
feature_b = c(3, 2, 1, 3, 2, 1, 3, 2, 1, 3),
feature_c = c(2, 3, 1, 2, 3, 1, 2, 3, 1, 2),
feature_d = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2),
feature_e = c(2, 1, 2, 1, 2, 1, 2, 1, 2, 1),
feature_f = c(2, 2, 1, 1, 2, 2, 1, 1, 2, 2))
contract_id feature_a feature_b feature_c feature_d feature_e feature_f
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1001 1 3 2 1 2 2
1002 2 2 3 2 1 2
1003 3 1 1 1 2 1
1004 1 3 2 2 1 1
1005 2 2 3 1 2 2
1006 3 1 1 2 1 2
1007 1 3 2 1 2 1
1008 2 2 3 2 1 1
1009 3 1 1 1 2 2
1010 1 3 2 2 1 2
這些是 60 個查找表中的 2 個,它們被多次使用,例如 lookup_a 使用了 8 次,lookup_b 使用了 15 次:
lookup_a = tibble(id = c(1, 2, 3),
value = c("yes", "no", "yes, mandatory"))
lookup_b = tibble(id = c(1, 2),
value = c("yes", "no"))
這是所需結果的外觀(feature_a - c 使用 lookup_a 和 feature_d - f 使用查找 b):
df_expected <-tibble(contract_id = c(1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010),
feature_a = c("yes", "no", "yes, mandatory", "yes", "no", "yes, mandatory", "yes", "no", "yes, mandatory", "yes"),
feature_b = c("yes, mandatory", "no", "yes", "yes, mandatory", "no", "yes", "yes, mandatory", "no", "yes", "yes, mandatory"),
feature_c = c("no", "yes, mandatory", "yes", "no", "yes, mandatory", "yes", "no", "yes, mandatory", "yes", "no"),
feature_d = c("yes", "no", "yes", "no", "yes", "no", "yes", "no", "yes", "no"),
feature_e = c("no", "yes", "no", "yes", "no", "yes", "no", "yes", "no", "yes"),
feature_f = c("no", "no", "yes", "yes", "no", "no", "yes", "yes", "no", "no"))
contract_id feature_a feature_b feature_c feature_d feature_e feature_f
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1001 yes yes, mandatory no yes no no
1002 no no yes, mandatory no yes no
1003 yes, mandatory yes yes yes no yes
1004 yes yes, mandatory no no yes yes
1005 no no yes, mandatory yes no no
1006 yes, mandatory yes yes no yes no
1007 yes yes, mandatory no yes no yes
1008 no no yes, mandatory no yes yes
1009 yes, mandatory yes yes yes no no
1010 yes yes, mandatory no no yes no
我當然可以為每一列創建一個連接,但這並不令人滿意。 我想保持連接的數量盡可能少:
df %>%
left_join(lookup_a, by = c("feature_a" = "id")) %>%
select(-feature_a) %>%
rename(feature_a = value)
我也嘗試過使用 data.table 或匹配的不同方法,但我還沒有找到一次加入多個列的方法。 我的問題是所有列都更改了,而不是選定的列。
以下是我的問題:
也許我現在想的太復雜了,解決方案相對簡單。
先感謝您!
歡迎! 您可以使用要更改的列索引across
mutate
動詞中使用 cross 替換多個列的值(對於列 a 到 c 為 2 到 4,對於列 d 到 f 為 5 到 7):
library(dplyr)
df %>%
mutate(across(2:4,
~case_when(. == 1 ~ "Yes",
. == 2 ~ "No",
. == 3 ~ "Yes, mandatory",
TRUE ~ "Error"))) %>%
mutate(across(5:7,
~case_when(. == 1 ~ "Yes",
. == 2 ~ "No",
TRUE ~ "Error")))
Output:
# A tibble: 10 x 7
contract_id feature_a feature_b feature_c feature_d feature_e feature_f
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 1001 Yes Yes, mandatory No Yes No No
2 1002 No No Yes, mandatory No Yes No
3 1003 Yes, mandatory Yes Yes Yes No Yes
4 1004 Yes Yes, mandatory No No Yes Yes
5 1005 No No Yes, mandatory Yes No No
6 1006 Yes, mandatory Yes Yes No Yes No
7 1007 Yes Yes, mandatory No Yes No Yes
8 1008 No No Yes, mandatory No Yes Yes
9 1009 Yes, mandatory Yes Yes Yes No No
10 1010 Yes Yes, mandatory No No Yes No
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.