[英]Check values or pattern in one dataframe column against another column to see if match appears
I have a list of brand name medications that I need to check whether or not they exist in a patients med list.我有一个品牌药物清单,我需要检查它们是否存在于患者医疗清单中。 The patients med list is primarily listed as generic form but I need to check if brand was entered and change it to generic.
患者医疗清单主要以通用形式列出,但我需要检查是否输入了品牌并将其更改为通用形式。 The patient's med list contains the drug plus directions in the column.
患者的药物列表在列中包含药物和说明。 My goal is to create a column that flags if brand shows up either with "yes" "no" or TRUE FALSE.
我的目标是创建一个列来标记品牌是否显示为“是”“否”或“真假”。 My dataset contains about 5000 brand name entries and patient list contains about 60000 entries.
我的数据集包含大约 5000 个品牌名称条目,患者列表包含大约 60000 个条目。 I am not sure where to begin because of the difference in patterns from brand list to patient list.
由于从品牌列表到患者列表的模式不同,我不确定从哪里开始。 Capitalization is also inconsistent in the patients med list as well.
患者医疗列表中的大写也不一致。 Any help is appreciated.
任何帮助表示赞赏。
Example dataset: MRN is patient ID示例数据集:MRN 是患者 ID
Brand <- c("Evista", "Rozerem", "Altace")
MRN <- c("121212", "121212", "231212", "432123", "432123", "542345",
"323412", "242341", "412111", "642321")
MedList <- c("raloxifene 60mg daily", "Rozerem 8mg daily", "evista 60mg
daily", "metoprolol tartate 25mg twice daily", "ramelteon 8mg daily",
"ramipril 5mg daily", "omeprazole 20mg daily", "ALTACE 5mg nightly",
"ramelteon 8mg daily", "imatinib 400mg daily")
Patients <- data.frame(MRN,MedList)
My goal is to end up with something like this我的目标是以这样的方式结束
inlist <- c(FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE,
FALSE)
Patients <- cbind(Patients, inlist)
Thank you.谢谢你。
Try this:尝试这个:
grepl(paste(toupper(Brand), collapse = '|'), toupper(MedList))
[1] FALSE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
If only capitalization but not spelling is an issue, grepl
should do what you want:如果只有大写而不是拼写是一个问题,
grepl
应该做你想做的:
grepl(paste0(Brand, collapse = "|"), MedList, ignore.case = TRUE)
In case your pattern
(ie, "Brand") is a really long vector, you may use str_detect()
from stringr .如果你的
pattern
(即“品牌”)是一个很长的矢量,你可以使用str_detect()
从stringr。 It is much faster and supports longer patterns (but it has no ignore.case
argument).它更快并且支持更长的模式(但它没有
ignore.case
参数)。
stringr::str_detect(tolower(MedList), paste0(tolower(Brand), collapse = "|"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.