[英]match columns and keep all duplicated elements in a data frame column [R]
我有兩個數據框; DF1有3列,DF2有1列.DF1具有DF2中包含的所有元素,但是大多數元素都被復制,如下所示。
DF1=
***freetext***, ***specific***, ***ICDcode***
Jaundice,hepatitisA,B,C Hepatitis A B15
Jaundice,hepatitisA,B,C Hepatitis B B16
Jaundice,hepatitisA,B,C Hepatitis C B17.1
Jaundice,hepatitisA,B,C Jaundice R17
lobar Pneumonia Lobar pneumonia J18.1
Lobar Pneumonia ,scabies Lobar pneumonia J18.1
scabiess scabies G10
DF2=
Jaundice,hepatitisA,B,C
scabiess
Lobar Pneumonia ,scabies
lobar Pneumonia
我希望在兩個數據幀之間有一個匹配項,以便每當匹配發生時,都應該有一個以DF1形式出現的結果數據幀。例如, 黃疸,肝炎A,B,C應該出現4次,而不是出現一次。柱。 換句話說,應按如下所示維護副本;
Resultant data frame should appear like this.
column1 column2 column3
Jaundice,hepatitisA,B,C Hepatitis A B15
Jaundice,hepatitisA,B,C Hepatitis B B16
Jaundice,hepatitisA,B,C Hepatitis C B17.1
Jaundice,hepatitisA,B,C Jaundice R17
因此,我應該如何遍歷DF2在DF1(第一列)中找到一個匹配項,然后產生一個與所有其他對應行匹配的數據幀,如上所示?
這是我的腳本,但似乎無法產生我想要的結果
newMatches<- data.frame()
for(i 1:nrow(DF1){ for(j in 1:nrow(DF2[,1]{grep(j, i, ignore.case=F, value=T)->newMatches}}
#it doesn't produce other columns of DF1
任何幫助和建議都將不勝感激。
據我了解,您想過濾DF1的行,僅保留DF2中第一列存在的行。 那正確嗎? 實現此目的的最簡單方法是
DF1[DF1[, 1] %in% DF2[, 1], ]
編輯
這是重現該示例的完整代碼:
DF1 <- structure(list(
freetext = structure(c(1L, 1L, 1L, 1L, 2L, 3L, 4L),
.Label = c("Jaundice,hepatitisA,B,C", "lobar Pneumonia",
"Lobar Pneumonia ,scabies", "scabiess"), class = "factor"),
specific = structure(c(1L, 2L, 3L, 4L, 5L, 5L, 6L),
.Label = c("Hepatitis A", "Hepatitis B", "Hepatitis C", "Jaundice",
"Lobar pneumonia", "scabies"), class = "factor"),
ICDcode = structure(c(1L, 2L, 3L, 6L, 5L, 5L, 4L),
.Label = c("B15", "B16", "B17.1", "G10", "J18.1", "R17"),
class = "factor")),
.Names = c("freetext", "specific", "ICDcode"),
row.names = c(NA, -7L), class = "data.frame")
DF2 <- structure(list(
freetext = structure(c(1L, 4L, 3L, 2L),
.Label = c("Jaundice,hepatitisA,B,C",
"lobar Pneumonia", "Lobar Pneumonia ,scabies", "scabiess"),
class = "factor")),
.Names = "freetext", row.names = c(NA, -4L), class = "data.frame")
result <- DF1[DF1[, 1] %in% DF2[, 1], ]
打印result
給出以下輸出
freetext specific ICDcode
1 Jaundice,hepatitisA,B,C Hepatitis A B15
2 Jaundice,hepatitisA,B,C Hepatitis B B16
3 Jaundice,hepatitisA,B,C Hepatitis C B17.1
4 Jaundice,hepatitisA,B,C Jaundice R17
5 lobar Pneumonia Lobar pneumonia J18.1
6 Lobar Pneumonia ,scabies Lobar pneumonia J18.1
7 scabiess scabies G10
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.