簡體   English   中英

匹配兩個列表,一個包含部分字符串,另一個包含完整字符串,如果匹配則返回整個字符串

[英]Match two lists, one with partial strings and another with full string, return the whole string if match

匹配R中的兩個列表,一個帶有部分字符串,另一個帶有完整字符串,如果匹配則返回整個字符串。 僅返回唯一匹配(一次)。

所以,假設我有一個CSV文件,每行都有一個長字符串(長列表)。 然后,我使用substr縮短字符串,然后使用unique刪除任何重復的字符串。 然后,我想比較長的字符串列表df12與獨特的短名單df14 ,如果有偏字符串搜索獨特的比賽( df14 VS df12 ),然后從返回整個字符串df12

這是df12 (長字符串列表)

    [1] I like stackoverflow very much today
    [2] I like stackoverflow much today
    [3] I dont like stackoverflow very much today
    [4] I dont like you!
    [5] What? 

df13<-substr(df12, start=0, stop=30)

這是df13 (縮短的字符串 - 不是唯一的)

[1] I like stacko
[2] I like stacko
[3] I dont like s
[4] I dont like y
[5] What? 
df14<-unique(df13)

這是df14 (縮短的字符串 - 應用唯一方法后的唯一字符串)

    [1] I like stacko
    [2] I dont like s
    [3] I dont like y
    [4] What? 

這是我最終想要的結果

    [1] I like stackoverflow very much today
    [2] I dont like stackoverflow very much today
    [3] I dont like you!
    [4] What?

這是將df14中的每個短字符串與df12中的所有可能匹配進行匹配並輸出它們的一種方法,包括短字符串作為列表中的索引,以便知道哪個與df12中的匹配:

df1 <- c('I like stackoverflow very much today', 'I like stackoverflow much today',
         'I dont like stackoverflow very much today', 'I dont like you!',
         'What?')
df2 <- c('I like stacko',  'I dont like s', 'I dont like y', 'What?')

sapply(df2, function(x) df1[grepl(x, df1)])
$`I like stacko`
[1] "I like stackoverflow very much today" "I like stackoverflow much today"     

$`I dont like s`
[1] "I dont like stackoverflow very much today"

$`I dont like y`
[1] "I dont like you!"

$`What?`
[1] "What?"

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM