简体   繁体   中英

Fastest way to search ASCII files in C# for simple keywords?

Right now, I search ASCII files for simple keywords like this:

int SearchInFile (string file, string searchString)
{
    int num = 0;

    StreamReader reader = File.OpenText (file);
    string line = reader.ReadLine();

    while (line != null)
    {
        int count = CountSubstrings(line, searchString);
        if (count != 0)
        {
            num += count;
        }
        line = reader.ReadLine();
    }

    reader.Close();

    return num;
}

Is this the fastest, most memory efficient way to do it? Returning the count is optional if it's going to make a huge difference in the way of searching, but not on its own.

I use it like:

SearchInFile ( "C:\\text.txt", "cool" );

In unmanaged code the most effective way from the performance side will be to use Memory-Mapped Files instead of reading the file in buffer. I am sure that the best results can be achieved only in the way, especially if the file which you want to scan could be a file from the remote storage (a file from the server).

I am not sure that the usage of the corresponding .NET 4.0 classes will be in your case exactly the same effective.

Just load the text file into a large string using StreamReader's ReadToEnd method and use string.IndexOf():

string test = reader.ReadToEnd();

test.indexOf("keyword")

If you really want more performance (processing files on the order of hundreds of MB or GB), then instead of doing a line-by-line search, you should read in strings by blocks of perhaps 1k and do searches on them. Despite having to deal with some boundary conditions, this should prove faster.

That being said, you should apply a profiler like ANTS to see if this is actually your bottleneck.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM