简体   繁体   English

两个数据集:如何检查一个数据集的一列的值是否包含在 R 中另一个数据集的另一列中?

[英]Two datasets: How to check if the values of a column of a dataset are contained in another column of another dataset in R?

I have two datasets data1 and data2.我有两个数据集data1和data2。 It should be noted that my data1 contains 300 rows and data2 contains 5000 rows.需要注意的是,我的data1包含300行,data2包含5000行。 Both datasets have a column named x2 (as you can see above).两个数据集都有一个名为 x2 的列(如上所示)。 The x2 column of data2 contains 5000 values on the names of the cars and x2 of data1 contains just 300 names of the cars. data2 的 x2 列包含 5000 个汽车名称值,而 data1 的 x2 列仅包含 300 个汽车名称。
How to check that the x2 of data1 is contained in the x2 of data1?如何检查data1的x2是否包含在data1的x2中?

data1 <- data.frame(x1 = c(1, 3, 7, 7, 4, 7),  
                    x2 = c("a 1-metha (akD)", "methal methal", "methy", "3-[3-(methy)prox", 
                         "3-carbon (C:H)", "z"),
                             x3 = 10:15)

data2 <- data.frame(x1 = c(1, 3, 7, 7, 4, 7),  
                    x2 = c("a 1-metha (akD)|a 1-metha akaikedenioyl|a 1-m(akD)", "methal methal|X.methal methal|methal (22)", "methy", "3-[3-(methy)prox", 
                         "3-carbon (C:H)", "y"),
                             x3 = 20:25)

I just started using the R language.我刚开始使用 R 语言。 But I tried with the grep function.但我尝试使用 grep function。 I try to automate, to avoid doing it value after value.我尝试自动化,以避免在价值之后做它。

matchedValue <- grep(str_extract(data1$x1[1], "([[:alnum:][:punct:][:blank:]]+)"), 
        str_extract(data2$x2, "([[:alnum:][:punct:][:blank:]]+)"),
        ignore.case = T)

I want to know if for example a 1-metha (akD) (Please see column x2 of data1) is also present in x2 of data2 and I want do it automatically for all 300 rows of data1.我想知道例如 1-metha (akD)(请参阅 data1 的 x2 列)是否也存在于 data2 的 x2 中,我想为所有 300 行 data1 自动执行此操作。
How do I do this please?请问我该怎么做?

library(tidyverse)

data1 %>% 
  mutate(in_data2 = x2 %in% str_extract(data2$x2, "^[^\\|]*"))

# A tibble: 6 × 4
     x1 x2                  x3 in_data2
  <dbl> <chr>            <int> <lgl>   
1     1 a 1-metha (akD)     10 TRUE    
2     3 methal methal       11 TRUE    
3     7 methy               12 TRUE    
4     7 3-[3-(methy)prox    13 TRUE    
5     4 3-carbon (C:H)      14 TRUE    
6     7 z                   15 FALSE 

We could use str_detect with fixed() , see https://cran.r-project.org/web/packages/stringr/vignettes/stringr.html#fixed-matches我们可以使用str_detectfixed() ,参见https://cran.r-project.org/web/packages/stringr/vignettes/stringr.html#fixed-matches

library(dplyr)
library(stringr)

data1 %>% 
  mutate(check = str_detect(x2, fixed(data2$x2)))
  x1               x2 x3 check
1  1  a 1-metha (akD) 10 FALSE
2  3    methal methal 11 FALSE
3  7            methy 12  TRUE
4  7 3-[3-(methy)prox 13  TRUE
5  4   3-carbon (C:H) 14  TRUE
6  7                z 15 FALSE

You can use colSums on the matrix returned from using sapply to check the each row of data1 against the entire column of data2.您可以在使用 sapply 返回的矩阵上使用 colSums 来检查 data1 的每一行与 data2 的整个列。

data1$isin <- (colSums(sapply(data1$x2, \(x) grepl(x, data2$x2, fixed = T))) > 0) 
x1               x2 x3  isin
1  1  a 1-metha (akD) 10  TRUE
2  3    methal methal 11  TRUE
3  7            methy 12  TRUE
4  7 3-[3-(methy)prox 13  TRUE
5  4   3-carbon (C:H) 14  TRUE
6  7                z 15 FALSE

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 合并数据集,其中键位于R中一个数据集的列和另一数据集的行上 - Merging datasets where the key is on the column of one dataset and row of another in R 将数据集中的列添加到R中的另一个数据集中 - add a column from a dataset to another dataset in R 如何通过将一个数据集的值匹配到另一个数据集来连接 R 中的两个数据集? - How to join two datasets in R by matching values from one dataset to another? R使用其他数据集中的值重命名列 - R rename column with a value from another dataset 如何根据数据集中的另一列将数据集中的列分为三组(三分位数)? 使用 R - How to divide column in dataset into three groups (tertiles) based on another column in the dataset? Using R 使用 R 将数据集 A 的一列与数据集 B 的另一列分开 - Divide one column of dataset A with another columns of dataset B using R 检查一列的值是否包含在 R 数据帧中的另一列中,然后将包含的列添加为新列 - Check if one column's values are contained within another column's in R dataframes and then add the contained column as a new column 如何使用 R 精确匹配整个数据集中的两列值 - How to exact match two column values in entire Dataset using R 将数据集中的值与R中另一个数据集中的值相乘 - Multiply values in a dataset by values in another dataset in R R.如何检查一个数据集是否在另一个数据集中包含相同的元素 - R. How to check if a dataset contains the same elements in another dataset
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM