[英]Condtional match columns across different dataframes
我正在使用兩個數據集 - 一組有成對的項目:
original <- data.frame(label1 = c("cat", "cat", "dog", "dog", "cat", "tiger", "tiger", "cow"),
label2 = c("dog", "dog", "cat", "cat", "dog", "cow", "cow", "tiger"))
original
label1 label2
1 cat dog
2 cat dog
3 dog cat
4 dog cat
5 cat dog
6 tiger cow
7 tiger cow
8 cow tiger
第二個數據集包含第一組項目的索引代碼:
index <- data.frame(item = c("cat", "dog", "tiger", "cow"),
code = c(1, 0, 1, 0))
index
item code
1 cat 1
2 dog 0
3 tiger 1
4 cow 0
我正在尋找一種方法來創建兩個新列: tag0
和tag1
,使其看起來像這樣:
new <- data.frame(label1 = c("cat", "cat", "dog", "dog", "cat", "tiger", "tiger", "cow"),
label2 = c("dog", "dog", "cat", "cat", "dog", "cow", "cow", "tiger"),
tag1 = c("cat", "cat", "cat", "cat", "cat", "tiger", "tiger", "tiger"),
tag0 = c("dog", "dog", "dog", "dog", "dog", "cow", "cow", "cow"))
new
label1 label2 tag1 tag0
1 cat dog cat dog
2 cat dog cat dog
3 dog cat cat dog
4 dog cat cat dog
5 cat dog cat dog
6 tiger cow tiger cow
7 tiger cow tiger cow
8 cow tiger tiger cow
tag0
是指label對應code=0
, tag1
指label對應index
dataframe中code=1
。
任何人都可以通過基於tidyverse
的解決方案幫助我嗎?
這是tidyverse
中的兩個解決方案。 雖然第一個適用於這種特殊情況,但我更喜歡第二個,它更優雅且可擴展。
label*
的JOIN
* 首先導入tidyverse
並生成您的數據集original
和index
。
library(tidyverse)
# ...
# Code to generate 'original' and 'index' datasets.
# ...
然后應用此工作流程。
original %>%
# Uniquely identify each row (for pivoting later).
mutate(row_id = row_number()) %>%
# Match 'label1' to the tags.
left_join(
index,
by = c("label1" = "item"),
keep = TRUE
) %>%
# Match 'label2' to the tags.
left_join(
index,
by = c("label2" = "item"),
keep = TRUE,
suffix = c(".1", ".2")
) %>%
# Pivot 'item.1 | ... | item.n | code.1 | ... | code.n' into a consolidated
# 'item | code' form.
pivot_longer(
cols = matches("^(item|code)\\.(\\d+)?$"),
names_pattern = "^(item|code)\\.(\\d+)?$",
names_to = c(".value", NA)
) %>%
# Pivot back into a 'tag1 | tag0' form.
pivot_wider(
values_from = item,
names_from = code,
names_glue = "tag{code}"
) %>%
# Omit unique identifier.
select(!row_id)
鑒於此處復制的original
數據集和index
數據集
original <- data.frame(
label1 = c("cat", "cat", "dog", "dog", "cat", "tiger", "tiger", "cow"),
label2 = c("dog", "dog", "cat", "cat", "dog", "cow", "cow", "tiger")
)
index <- data.frame(
item = c("cat", "dog", "tiger", "cow"),
code = c(1, 0, 1, 0)
)
該解決方案應產生以下結果:
# A tibble: 8 x 4
label1 label2 tag1 tag0
<chr> <chr> <chr> <chr>
1 cat dog cat dog
2 cat dog cat dog
3 dog cat cat dog
4 dog cat cat dog
5 cat dog cat dog
6 tiger cow tiger cow
7 tiger cow tiger cow
8 cow tiger tiger cow
如果您的original
數據集有任何其他label*
列,則您需要為這些列中的每一列執行額外的JOIN
。
CROSS JOIN
這是一個更優雅的工作流程,也更靈活:它適用於original
中任意數量的label*
列和index
中的任意code
集。
original %>%
# Uniquely identify each row (for pivoting later).
mutate(row_id = row_number()) %>%
# Perform a cross-join compare every 'item' to every 'label*'.
full_join(
index,
by = character()
) %>%
# Keep only those rows where 'item' matches a 'label*'.
rowwise() %>%
filter(item %in% c_across(matches("^label\\d+"))) %>%
# Pivot into a 'tag1 | tag0' form.
pivot_wider(
values_from = item,
names_from = code,
names_glue = "tag{code}"
) %>%
# Omit unique identifier.
select(!row_id)
結果保持不變。
# A tibble: 8 x 4
label1 label2 tag1 tag0
<chr> <chr> <chr> <chr>
1 cat dog cat dog
2 cat dog cat dog
3 dog cat cat dog
4 dog cat cat dog
5 cat dog cat dog
6 tiger cow tiger cow
7 tiger cow tiger cow
8 cow tiger tiger cow
唯一的缺點是它必須執行CROSS JOIN
,這可能會阻礙更大數據集的性能。
另一種可能的解決方案:
library(tidyverse)
original <- data.frame(label1 = c("cat", "cat", "dog", "dog", "cat", "tiger", "tiger", "cow"),
label2 = c("dog", "dog", "cat", "cat", "dog", "cow", "cow", "tiger"))
index <- data.frame(item = c("cat", "dog", "tiger", "cow"),
code = c(1, 0, 1, 0))
original %>%
full_join(index, by=c("label1" = "item")) %>%
full_join(index, by=c("label2" = "item")) %>%
mutate(tag1 = if_else(code.x == 1, label1, label2)) %>%
mutate(tag2 = if_else(code.y == 1, label1, label2)) %>%
select(!starts_with("code"))
#> label1 label2 tag1 tag2
#> 1 cat dog cat dog
#> 2 cat dog cat dog
#> 3 dog cat cat dog
#> 4 dog cat cat dog
#> 5 cat dog cat dog
#> 6 tiger cow tiger cow
#> 7 tiger cow tiger cow
#> 8 cow tiger tiger cow
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.