简体   繁体   中英

Fastest way to search for contents of files

I'm doing a search for specific text in web files. The user enters the text. There are about 850 files I have to search. The code below accomplishes what I want but it takes about 11-13 seconds. This code is in a web service I call from a web page using $.ajax GET. Is there a way I can improve the code so the search goes faster? Or should I be looking at other areas instead of my code?

I do the replaces in the document because of how the files are created (they create web files using MS Word...another battle) and it improves my search results.

var searchResults = new StringBuilder();

var parameters = searchParameters.Split('|');

var searchOnCompletePhrase = bool.Parse(parameters[1]);

var completePhrasePattern = @"\b(?:" + Regex.Escape(parameters[0].ToString()) + @")\b";

var files = Directory.GetFiles(path, "*.htm");

if (searchOnCompletePhrase && searchPhrase.Length > 1)
{
    foreach (var currentFile in files)
    {
        document.Load(currentFile);

        contents = document.DocumentNode.InnerText.Replace("\r", string.Empty)
            .Replace("\n", string.Empty)
            .Replace(" ", string.Empty)
            .Replace("  ", " ");

        if (contents.ToLower().IndexOf(searchPhrase.ToLower()) > -1)
        {
            searchResults.AppendLine(currentFile);

            searchResults.Append("|");
        }
    }
}
else
{
    var keywords = parameters[0].Split(' ');

    foreach (var currentFile in files)
    {
        document.Load(currentFile);

        contents = document.DocumentNode.InnerText.Replace("\r", string.Empty)
            .Replace("\n", string.Empty)
            .Replace(" ", string.Empty)
            .Replace("  ", " ");

        var found = true;

        foreach (var word in keywords)
        {
            if (!SearchCurrentWord(word.ToString()))
            {
                found = false;

                break;
            }
        }

        if (found)
        {
            searchResults.AppendLine(currentFile);

            searchResults.Append("|");
        }
    }
}

也许您应该尝试使用Parallel.Foreach而不是foreach循环,以避免顺序等待磁盘中的每个文件。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM