简体   繁体   中英

R: extract data from one column based on string detected in another

I have a large dataset I wish to summarize. The data are health records in which each individual has had many organs/tissues examined, and diagnoses are entered in narrative form. I have a few key diagnosis terms I want to find, and then I want to know what organs were associated with the diagnosis.

example (all entries converted to character strings)

dataframe1

Organ          Diagnosis
lungs          interstitial pneumonia
liver          hepatic congestion ; diffuse
cerebrum       traumatic disruption and hemorrhage       
adrenal gland  focal hemorrhage

dataframe2

Keywords
congestion
hemorrhage
trauma
pneumonia

I want to search dataframe1$Diagnosis for strings that match dataframe2$Keywords , and for each match, return the organ entered in the corresponding row of dataframe1$Organ .

data structures

dataframe1 <- structure(list(Organ = c("lungs", "liver", "cerebrum", "adrenal gland"
), Diagnosis = c("interstitial pneumonia", "hepatic congestion ; diffuse", 
"traumatic disruption and hemorrhage", "focal hemorrhage")), .Names = c("Organ", 
"Diagnosis"), class = "data.frame", row.names = c(NA, -4L))

dataframe2 <- data.frame(Keywords=c("congestion","hemorrhage","trauma","pneumonia"),stringsAsFactors=FALSE)

We can use grep

sapply(dataframe2$Keywords, function(x) 
       toString(trimws(dataframe1[,1][grep(x, dataframe1[,2])])))

I think it's probably valuable to return a stacked list of what matches what, as in:

stack(
  sapply(dataframe2$Keywords, 
         function(x) dataframe1$Organ[grepl(x, dataframe1$Diagnosis)])
)

#         values        ind
#1         liver congestion
#2      cerebrum hemorrhage
#3 adrenal gland hemorrhage
#4      cerebrum     trauma
#5         lungs  pneumonia

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM