I have a large dataset I wish to summarize. The data are health records in which each individual has had many organs/tissues examined, and diagnoses are entered in narrative form. I have a few key diagnosis terms I want to find, and then I want to know what organs were associated with the diagnosis.
example (all entries converted to character strings)
dataframe1
Organ Diagnosis
lungs interstitial pneumonia
liver hepatic congestion ; diffuse
cerebrum traumatic disruption and hemorrhage
adrenal gland focal hemorrhage
dataframe2
Keywords
congestion
hemorrhage
trauma
pneumonia
I want to search dataframe1$Diagnosis
for strings that match dataframe2$Keywords
, and for each match, return the organ entered in the corresponding row of dataframe1$Organ
.
dataframe1 <- structure(list(Organ = c("lungs", "liver", "cerebrum", "adrenal gland"
), Diagnosis = c("interstitial pneumonia", "hepatic congestion ; diffuse",
"traumatic disruption and hemorrhage", "focal hemorrhage")), .Names = c("Organ",
"Diagnosis"), class = "data.frame", row.names = c(NA, -4L))
dataframe2 <- data.frame(Keywords=c("congestion","hemorrhage","trauma","pneumonia"),stringsAsFactors=FALSE)
We can use grep
sapply(dataframe2$Keywords, function(x)
toString(trimws(dataframe1[,1][grep(x, dataframe1[,2])])))
I think it's probably valuable to return a stacked list of what matches what, as in:
stack(
sapply(dataframe2$Keywords,
function(x) dataframe1$Organ[grepl(x, dataframe1$Diagnosis)])
)
# values ind
#1 liver congestion
#2 cerebrum hemorrhage
#3 adrenal gland hemorrhage
#4 cerebrum trauma
#5 lungs pneumonia
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.