[英]R function that detects if a dataframe column contains string values from another dataframe column and adds a column that contains the detected str
I have two dataframes:我有两个数据框:
df1: df1:
name![]() |
---|
Apple page![]() |
Mango page![]() |
Lychee juice![]() |
Cranberry club![]() |
df2: df2:
fruit![]() |
---|
Apple![]() |
Grapes![]() |
Strawberry![]() |
Mango![]() |
lychee![]() |
cranberry![]() |
If df1$name contains a value in df2$fruit (non case-sensitive), I want to add a column to df1 that has the value from df2$fruit that df1$name contains.如果 df1$name 包含 df2$fruit 中的值(不区分大小写),我想向 df1 添加一个列,该列具有 df1$name 包含的 df2$fruit 值。 df1 would then look like this:
df1 然后看起来像这样:
name![]() |
category![]() |
---|---|
Apple page![]() |
Apple![]() |
Mango page![]() |
Mango![]() |
Lychee juice![]() |
lychee![]() |
Cranberry club![]() |
cranberry![]() |
This should work:这应该有效:
library(stringr)
df1$category = str_extract(
df1$name,
pattern = regex(paste(df2$fruit, collapse = "|"), ignore_case = TRUE)
)
df1
# name category
# 1 Apple page Apple
# 2 Mango page Mango
# 3 Lychee juice Lychee
# 4 Cranberry club Cranberry
Using this data:使用这些数据:
df1 = read.table(text = 'name
Apple page
Mango page
Lychee juice
Cranberry club', header = T, sep = ";")
df2 = read.table(text = 'fruit
Apple
Grapes
Strawberry
Mango
lychee
cranberry', header = T, sep = ";")
First you could a column for each of the possible categories to the dataframe with the names, as placeholders (just filled with NA).首先,您可以使用名称作为占位符(仅填充 NA)为数据框的每个可能类别创建一列。 Then for each of those columns, check whether the column name (so the category) appears in the name.
然后对于这些列中的每一个,检查列名(即类别)是否出现在名称中。 Turn it into a long dataframe, and then remove the
FALSE
rows -- those that didn't detect the category in the name.把它变成一个长数据框,然后删除
FALSE
行——那些没有检测到名称中的类别的行。
library(tidyverse)
df1 <- tribble(
~name,
"Apple page",
"Mango page",
"Lychee juice",
"Cranberry club"
)
df2 <- tribble(
~fruit,
"Apple",
"Grapes",
"Strawberry",
"Mango",
"lychee",
"cranberry"
)
fruits <- df2$fruit %>%
str_to_lower() %>%
set_names(rep(NA_character_, length(.)), .)
df1 %>%
add_column(!!!fruits) %>%
mutate(across(-name, ~str_detect(str_to_lower(name), cur_column()))) %>%
pivot_longer(-name, names_to = "category") %>%
filter(value) %>%
select(-value)
#> # A tibble: 4 × 2
#> name category
#> <chr> <chr>
#> 1 Apple page apple
#> 2 Mango page mango
#> 3 Lychee juice lychee
#> 4 Cranberry club cranberry
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.