[英]Fastest way to search ASCII files in C# for simple keywords?
Right now, I search ASCII files for simple keywords like this: 现在,我在ASCII文件中搜索这样的简单关键字:
int SearchInFile (string file, string searchString)
{
int num = 0;
StreamReader reader = File.OpenText (file);
string line = reader.ReadLine();
while (line != null)
{
int count = CountSubstrings(line, searchString);
if (count != 0)
{
num += count;
}
line = reader.ReadLine();
}
reader.Close();
return num;
}
Is this the fastest, most memory efficient way to do it? 这是最快,最有效的内存方式吗? Returning the count is optional if it's going to make a huge difference in the way of searching, but not on its own.
返回计数是可选的,如果它将在搜索方式上产生巨大差异,但不是单独的。
I use it like: 我用它像:
SearchInFile ( "C:\\text.txt", "cool" );
In unmanaged code the most effective way from the performance side will be to use Memory-Mapped Files instead of reading the file in buffer. 在非托管代码中,性能方面最有效的方法是使用内存映射文件而不是在缓冲区中读取文件。 I am sure that the best results can be achieved only in the way, especially if the file which you want to scan could be a file from the remote storage (a file from the server).
我确信只有这样才能获得最佳结果,特别是如果您要扫描的文件可能是来自远程存储的文件(来自服务器的文件)。
I am not sure that the usage of the corresponding .NET 4.0 classes will be in your case exactly the same effective. 我不确定相应的.NET 4.0 类的使用是否与您的情况完全相同。
If you really want more performance (processing files on the order of hundreds of MB or GB), then instead of doing a line-by-line search, you should read in strings by blocks of perhaps 1k and do searches on them. 如果你真的想要更高的性能(处理大约数百MB或GB的文件),那么你应该按照大约1k的块来读取字符串,然后对它们进行搜索,而不是逐行搜索。 Despite having to deal with some boundary conditions, this should prove faster.
尽管必须处理一些边界条件,但这应该更快。
That being said, you should apply a profiler like ANTS to see if this is actually your bottleneck. 话虽这么说,你应该应用像ANTS这样的探查器,看看这是否真的是你的瓶颈。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.