繁体   English   中英

基于来自另一列的匹配项的来自列的子集单元格值

[英]r subset cell value from a column based on a match from another column

我有两列的数据帧specialtykeywords 如果在search terms与列specialty任何值之间找到匹配search terms ,我将使用以下代码从列keywords提取值:

speciality <- c("Emergency medicine","Allergology","Anesthesiology","Hematology","Cardiology")
keywords <- c("emergency room OR emergency medicine OR emergency department", 
          "Allergy OR rhinitis OR asthma OR atopic eczema", 
          "Pain OR local anaesthesia OR general anaesthesia OR induced sleep", 
          "Anemia OR bleeding disorders OR hemophilia OR blood cancers", 
          "Heart OR cardiac diseases OR Cardiomyopathy OR Congenital Heart Disease OR Cardiac Arrhythmia")
sample <- data.frame(speciality, keywords)
keyspecial <- "Allergology"
subkeywords <- subset(sample$keywords, sample$speciality==keyspecial)
View(subkeywords)

所以我正在寻找专栏speciality Allergology 一旦我运行代码,我就会得到Allergy OR rhinitis OR asthma OR atopic eczema

我现在面临的问题是,如果我要寻找的allergology ,而不是Allergology ,我没有得到结果。 或者,如果我只想使用emergency而不是Emergency medicine进行搜索。

有什么建议吗?

改变这一行:

subkeywords <- subset(sample$keywords, sample$speciality==keyspecial)

对此:

subkeywords <- subset(sample$keywords, grepl(keyspecial, sample$speciality, ignore.case=TRUE))

它之所以起作用是因为函数grepl ,该函数具有ignore.case参数,可以将其设置为TRUE以忽略大小写。 然而,这个寻找不完整的匹配。 因此,当您搜索Allergology 时,它还会找到The Allergology 之类的内容。

为了只匹配完整的单词,你可以使用这个:

subkeywords <- subset(sample$keywords, tolower(sample$speciality)==tolower(keyspecial))

通过这种方式,您将首先将两个单词转换为小写形式,然后再进行比较。

您可以使用str_detect并忽略大小写

library(tidyverse)
keyspecial <- "allergology"

sample %>% 
  filter(str_detect(speciality, fixed(keyspecial, ignore_case = TRUE)))

您可以尝试像这样进行一些字符串修剪:

matchList <- sapply(speciality,function(x) strsplit(tolower(x),split=" ")[[1]])
keyspecial <- "Allergology"
subkeywords <- subset(sample$keywords,sapply(matchList,function(y){any(tolower(keyspecial) %in% y)}))
View(subkeywords)
keyspecial <- "allergology"
subkeywords <- subset(sample$keywords,sapply(matchList,function(y){any(tolower(keyspecial) %in% y)}))
View(subkeywords)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM