[英]Join two data frames by searching & matching strings
我有兩個數據框
df1
+-------+---------+
| Id | Title |
+-------+---------+
| 1 | AAA |
| 2 | BBB |
| 3 | CCC |
+-------+---------+
和
df2
+-------+---------------+------------------------------------+
| Id | Sub | Body |
+-------+---------------+------------------------------------+
| 1 | some sub1 | some mail body AAA some text here |
| 2 | some sub2 | some text here BBB continues here |
| 3 | some sub3 | some text AAA present here |
| 4 | some sub4 | AAA string is present here also |
| 5 | some sub5 | CCC string is present here |
+-------+---------------+------------------------------------+
我想將df1中的Title
與df2的Body
列進行匹配,
如果“正文”列中存在標題字符串,則應將兩行連接在一起,輸出數據框應類似於:
df3
+----------+---------------+------------------------------------+
| Title | Sub | Body |
+----------+---------------+------------------------------------+
| AAA | some sub1 | some mail body AAA some text here |
| BBB | some sub2 | some text here BBB continues here |
| AAA | some sub3 | some text AAA present here |
| AAA | some sub4 | AAA string is present here also |
| CCC | some sub5 | CCC string is present here |
+----------+---------------+------------------------------------+
一種解決方案可能看起來像這樣,盡管經驗豐富的R用戶可能會提出更好的答案
# set up test data
df1 <- data.frame(stringsAsFactors = F,
id = 1:3,
title = c('AAA', 'BBB', 'CCC'))
df2 <- data.frame(stringsAsFactors = F,
id = 1:5,
sub = c('some sub1', 'some sub2', 'some sub3', 'some sub4', 'some sub5'),
body = c('some mail body AAA some text here',
'some text here BBB continous here',
'some text AAA present here',
'AAA string is present here also',
'CCC string is present here'))
# join data frames
df.list <- lapply(1:nrow(df1), function (idx) cbind(title=df1[idx,2], df2[grepl(df1$title[idx], df2$body), 2:3]))
do.call('rbind', df.list)
這將導致以下輸出
title sub body
1 AAA some sub1 some mail body AAA some text here
3 AAA some sub3 some text AAA present here
4 AAA some sub4 AAA string is present here also
2 BBB some sub2 some text here BBB continous here
5 CCC some sub5 CCC string is present here
如果我們不能依靠每個標題將與df2
中的某些行匹配的事實,那么您可能想要做這樣的事情
# set up test data
df1 <- data.frame(stringsAsFactors = F,
id = 1:4,
title = c('AAA', 'AAA BB', 'BBB', 'CCC'))
df2 <- data.frame(stringsAsFactors = F,
id = 1:5,
sub = c('some sub1', 'some sub2', 'some sub3', 'some sub4', 'some sub5'),
body = c('some mail body AAA some text here',
'some text here BBB continous here',
'some text AAA present here',
'AAA string is present here also',
'CCC string is present here'))
MergeByTitle <- function(title.idx) {
df2.hits <- df2[grepl(df1$title[title.idx], df2$body), 2:3]
if (nrow(df2.hits) > 0)
cbind(title=df1[title.idx,2], df2.hits)
}
# join data frames
df.list <- lapply(1:nrow(df1), MergeByTitle)
do.call('rbind', df.list)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.