简体   繁体   English

从字符串数组中查找最常见的子字符串

[英]Find most common substring from an Array of Strings

I want to search a word from an array of strings.我想从字符串数组中搜索一个单词。

array = ["ram","gopal","varma","govind","ravan","alan"]

if my search text is goal i want to list as follows:如果我的搜索文本是goal我想列出如下:

result = ["gopal","govind","alan"]

ie in gopal & goal only p is missing so it should be in search list with higher priority.即在gopalgoal只缺少p ,因此它应该在具有更高优先级的搜索列表中。

Is there any way to do such filtering?有没有办法做这样的过滤?

You want to find longest common subsequences.你想找到最长的公共子序列。 I would suggest you to look at this excelent article on Ray Wenderlich's Swift Algorithm Club where you can find your solution with examples.我建议您查看 Ray Wenderlich 的 Swift Algorithm Club 上的这篇优秀文章,您可以在其中找到带有示例的解决方案。

EDIT:编辑:

Then you have to iterate over your array and keep track of how long a subsequence is for each world (for example in dictionary).然后你必须遍历你的数组并跟踪每个世界的子序列有多长(例如在字典中)。 Then you have to sort your array by the subsequences lengths.然后您必须按子序列长度对数组进行排序。

For example like this:例如像这样:

let array = ["ram", "gopal", "varma", "govind", "ravan", "alan"]
let searchTerm = "goal"

var dictionary: [String: Int] = [:]
for element in array {
    dictionary[element] = searchTerm.longestCommonSubsequence(element).count
}

let result = dictionary.sorted(by: { $0.1 > $1.1 }).map { $0.key }

The longest common subsequence between two strings can be defined recursively as follows :两个字符串之间的最长公共子序列可以递归定义如下:

func lcs(_ str1: String, _ str2: String, _ i: String.Index, _ j: String.Index) -> Int 
{
    if (i == str1.startIndex || j == str2.startIndex) {
        return 0
    }

    let beforeI = str1.index(before: i)
    let beforeJ = str2.index(before: j)

    if str1[beforeI] == str2[beforeJ] {
        return 1 + lcs(str1, str2, beforeI, beforeJ)
    } else {
        return max(lcs(str1, str2, i, beforeJ), lcs(str1, str2, beforeI, j))
    }
}

You can find a complete explanation of how this dynamic programming algorithm works here .您可以在此处找到有关此动态编程算法如何工作的完整说明。

So, given an array of strings and a search text :因此,给定一个字符串数组和一个搜索文本:

let array = ["ram", "gopal", "varma", "govind", "ravan", "alan", "logan"]
let searchText = "goal"

We can associate a score to each element of the array, filter only those that have a non-zero score, sort them by score, and then only key the words from the tuples :我们可以为数组的每个元素关联一个分数,仅过滤那些具有非零分数的元素,按分数对它们进行排序,然后只键入元组中的单词:

let result = array
    .map { ($0, lcs(searchText,
                    $0,
                    searchText.endIndex,
                    $0.endIndex)) }
    .filter { $0.1 > 0 }
    .sorted { $0.1 > $1.1 }
    .map { $0.0 }

print(result)

Which yields :产生:

["gopal", "govind", "alan", "logan", "ram", "varma", "ravan"]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM