[英]Match two lists, one with partial strings and another with full string, return the whole string if match
匹配R中的兩個列表,一個帶有部分字符串,另一個帶有完整字符串,如果匹配則返回整個字符串。 僅返回唯一匹配(一次)。
所以,假設我有一個CSV文件,每行都有一個長字符串(長列表)。 然后,我使用substr縮短字符串,然后使用unique刪除任何重復的字符串。 然后,我想比較長的字符串列表df12
與獨特的短名單df14
,如果有偏字符串搜索獨特的比賽( df14
VS df12
),然后從返回整個字符串df12
。
這是df12
(長字符串列表)
[1] I like stackoverflow very much today
[2] I like stackoverflow much today
[3] I dont like stackoverflow very much today
[4] I dont like you!
[5] What?
df13<-substr(df12, start=0, stop=30)
這是df13
(縮短的字符串 - 不是唯一的)
[1] I like stacko
[2] I like stacko
[3] I dont like s
[4] I dont like y
[5] What?
df14<-unique(df13)
這是df14
(縮短的字符串 - 應用唯一方法后的唯一字符串)
[1] I like stacko
[2] I dont like s
[3] I dont like y
[4] What?
這是我最終想要的結果
[1] I like stackoverflow very much today
[2] I dont like stackoverflow very much today
[3] I dont like you!
[4] What?
這是將df14中的每個短字符串與df12中的所有可能匹配進行匹配並輸出它們的一種方法,包括短字符串作為列表中的索引,以便知道哪個與df12中的匹配:
df1 <- c('I like stackoverflow very much today', 'I like stackoverflow much today',
'I dont like stackoverflow very much today', 'I dont like you!',
'What?')
df2 <- c('I like stacko', 'I dont like s', 'I dont like y', 'What?')
sapply(df2, function(x) df1[grepl(x, df1)])
$`I like stacko`
[1] "I like stackoverflow very much today" "I like stackoverflow much today"
$`I dont like s`
[1] "I dont like stackoverflow very much today"
$`I dont like y`
[1] "I dont like you!"
$`What?`
[1] "What?"
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.