R 函数，用于检测数据框列是否包含来自另一个数据框列的字符串值并添加包含检测到的 str 的列

Question

I have two dataframes:我有两个数据框：

df1: df1：

name姓名
Apple page苹果页面
Mango page芒果页面
Lychee juice荔枝汁
Cranberry club蔓越莓俱乐部

df2: df2:

fruit水果
Apple苹果
Grapes葡萄
Strawberry草莓
Mango芒果
lychee荔枝
cranberry蔓越莓

If df1$name contains a value in df2$fruit (non case-sensitive), I want to add a column to df1 that has the value from df2$fruit that df1$name contains.如果 df1$name 包含 df2$fruit 中的值（不区分大小写），我想向 df1 添加一个列，该列具有 df1$name 包含的 df2$fruit 值。 df1 would then look like this: df1 然后看起来像这样：

name姓名	category类别
Apple page苹果页面	Apple苹果
Mango page芒果页面	Mango芒果
Lychee juice荔枝汁	lychee荔枝
Cranberry club蔓越莓俱乐部	cranberry蔓越莓

Answer 1

This should work:这应该有效：

library(stringr)
df1$category = str_extract(
  df1$name, 
  pattern = regex(paste(df2$fruit, collapse = "|"), ignore_case = TRUE)
)

df1
#             name  category
# 1     Apple page     Apple
# 2     Mango page     Mango
# 3   Lychee juice    Lychee
# 4 Cranberry club Cranberry

Using this data:使用这些数据：

df1 = read.table(text = 'name
Apple page
Mango page
Lychee juice
Cranberry club', header = T, sep = ";")

df2 = read.table(text = 'fruit
Apple
Grapes
Strawberry
Mango
lychee
cranberry', header = T, sep = ";")

Answer 2

First you could a column for each of the possible categories to the dataframe with the names, as placeholders (just filled with NA).首先，您可以使用名称作为占位符（仅填充 NA）为数据框的每个可能类别创建一列。 Then for each of those columns, check whether the column name (so the category) appears in the name.然后对于这些列中的每一个，检查列名（即类别）是否出现在名称中。 Turn it into a long dataframe, and then remove the FALSE rows -- those that didn't detect the category in the name.把它变成一个长数据框，然后删除FALSE行——那些没有检测到名称中的类别的行。

library(tidyverse)

df1 <- tribble(
  ~name,
  "Apple page",
  "Mango page",
  "Lychee juice",
  "Cranberry club"
)
df2 <- tribble(
  ~fruit,
  "Apple",
  "Grapes",
  "Strawberry",
  "Mango",
  "lychee",
  "cranberry"
)

fruits <- df2$fruit %>%
  str_to_lower() %>% 
  set_names(rep(NA_character_, length(.)), .)

df1 %>% 
  add_column(!!!fruits) %>% 
  mutate(across(-name, ~str_detect(str_to_lower(name), cur_column()))) %>% 
  pivot_longer(-name, names_to = "category") %>% 
  filter(value) %>% 
  select(-value)

#> # A tibble: 4 × 2
#>   name           category 
#>   <chr>          <chr>    
#> 1 Apple page     apple    
#> 2 Mango page     mango    
#> 3 Lychee juice   lychee   
#> 4 Cranberry club cranberry

R 函数，用于检测数据框列是否包含来自另一个数据框列的字符串值并添加包含检测到的 str 的列

问题描述

2 个解决方案

解决方案1
2 2022-05-20 02:03:24

解决方案2
0 已采纳 2022-05-20 02:21:42

R 函数，用于检测数据框列是否包含来自另一个数据框列的字符串值并添加包含检测到的 str 的列

问题描述

2 个解决方案

解决方案1 2 2022-05-20 02:03:24

解决方案2 0 已采纳 2022-05-20 02:21:42

解决方案1
2 2022-05-20 02:03:24

解决方案2
0 已采纳 2022-05-20 02:21:42