简体   繁体   English

快速算法,用于查找字符串是否包含给定数组中的任何字符串

[英]Fast algorithm for finding out if a string contains any string in a given array

I have a list of about 50 keywords and about 50000 strings. 我有一个大约50个关键字和大约50000个字符串的列表。 I check every string if it contains at least one of the keywords. 我检查每个字符串是否包含至少一个关键字。 I'm not interested in the matched keyword or the number of matched keywords. 我对匹配关键字或匹配关键字的数量不感兴趣。 I only want a "true" or "false" back, as fast as possible. 我只想尽可能快地回到“真实”或“假”。

So, I bet there's an algorithm out there that outperforms my current LINQ version by far: 所以,我打赌那里的算法远远超过我目前的LINQ版本:

class MyEnumerableExtension
{
    public static bool ContainsAny(this string searchString, IEnumerable<string> keywords)
    {
        return keywords.Any(keyword => searchString.Contains(keyword))
    }
}

bool foundAny = "abcdef".ContainsAny(new string[] { "ac", "bd", "cd" } );

这与你今天的其他问题本质上是不一样的高效算法,用于查找文本中的所有关键字,除了修改后一旦找到匹配就返回?

多种算法可以在文本中搜索一组子字符串。

Yo可以实现Knuth-Morris-Pratt算法

A quick analysis shows that you are iteratively searching for the keywords. 快速分析表明您正在迭代搜索关键字。 If you could search in one pass for all fo the keywords, you should have an overall improvement in your algorithm. 如果您可以在一个关键字中搜索所有关键字,那么您的算法应该会有一个整体改进。 A Regex expression can do that and couple it with the "Compiled" option and you should begin to see a performance increase (because it will single pass the string for all keywords). Regex表达式可以做到并将它与“Compiled”选项结合起来,你应该开始看到性能提升(因为它会单独传递所有关键字的字符串)。 But, it would only benefit you if you have several keywords. 但是,如果您有多个关键字,它只会让您受益。 Here's a quick idea to help you along, but note, I have not actually tested the performance against your algorithm. 这是一个快速的想法,可以帮助你,但请注意,我实际上没有测试你的算法的性能。

        string[] keywords = { "ac", "bd", "cd" };
        string[] tosearch = { "abcdef" };
        string pattern = String.Join("|", keywords);
        Regex regex = new Regex(pattern, RegexOptions.Compiled);
        foundAny = regex.IsMatch(String.Join("|", tosearch));

Also note, this works as long as your keywords do not contain any Regex special characters (and your search strings do not contain the pipe symbol. However, the special characters could be overcome with escape sequences, and the search strings do not have to be joined as I've done. 另请注意,只要您的关键字不包含任何正则表达式特殊字符(并且您的搜索字符串不包含管道符号),这就可以工作。但是,可以使用转义序列来克服特殊字符,并且搜索字符串不必是我已经完成了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 字符串包含任何char数组 - String contains any char array C#确定字符串数组中的任何元素是否在任何地方都包含给定的字符串 - C# Determing whether any element in a string array contains a given string anywhere LINQ to SQL查询帮助(字符串包含字符串数组中的任何字符串) - LINQ to SQL query help (string contains any string in string array) LINQ - 字符串包含数组中的任何元素 - LINQ - Where a string contains any element in an array 获得IQueryable <T> 其中T的任何字段包含给定的字符串 - Get IQueryable<T> where any field of T contains a given string 如何检查字符串是否包含字符串数组中存在的任何元素 - How to check whether string contains any element present in the string array 如何检查字符串是否包含字符串数组中的任何元素? - How to check if a string contains any of the elements in an string array? C# 检查字符串是否包含字符串数组中的任何匹配项 - C# Check if string contains any matches in a string array 用于在字符串中查找子字符串的算法,对于子字符串不存在的情况,该算法非常快? - Algorithm for finding substrings in a string, that is very fast for the case that the subtrings don't exist? 是否有一种已知的快速算法来查找与给定数字相乘的所有数字对? - Is there a known fast algorithm for finding all pairs of numbers that multiply to a given number?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM