简体   繁体   English

C#中的字符串比较更快

[英]Faster string comparison in C#

I have a program that compares two files. 我有一个比较两个文件的程序。 I ran visual studio analysis and found that my comparison time is large. 我运行了visual studio分析,发现我的比较时间很长。 Is there a quicker way to compare two string than this? 比较这两个字符串有更快的方法吗? (I can't use parallel foreach because it might causes errors.) Right now I'm using a concurrent dictionary but I'm open to other options. (我不能使用并行foreach,因为它可能会导致错误。)现在我正在使用并发字典,但我对其他选项持开放态度。 :) :)

var metapath = new ConcurrentDictionary<string, string>();
foreach(var me in metapath)
{
 if (line.StartsWith(me.Key.ToString()))
 {...}
}

First of all, drop the ToString() from me.Key.ToString() . 首先,从me.Key.ToString()删除ToString() me.Key.ToString()

Next, use the ordinal string comparison (provided that this doesn't impact correctness): 接下来,使用序数字符串比较(假设这不影响正确性):

line.StartsWith(me.Key, StringComparison.Ordinal);

This is beneficial because standard string comparisons follow various Unicode rules on what's equal. 这是有益的,因为标准字符串比较遵循各种相同的Unicode规则。 For example, normalized and denormalized sequences must be treated as equal. 例如,标准化和非标准化序列必须被视为相等。 Ordinal just compares raw character data, ignoring Unicode equality rules. Ordinal只是比较原始字符数据,忽略Unicode相等规则。 There is more detail on this here , for example, or here (which claims it's faster but without quoting any numbers). 有这方面的更多细节在这里 ,例如,或在这里 (号称它的速度更快,但没有任何引用的数字)。

Last, profile the code. 最后,描述代码。 You'll be surprised, but most of the time the slow part is not at all what you think it is. 你会感到惊讶,但大多数时候,缓慢的部分根本不是你想象的那样。 For example, it could be the part where you add things to the dictionary. 例如,它可能是您向字典添加内容的部分。

If you compare strings exactly, String.Equals is quite good: 如果你准确地比较字符串,String.Equals非常好:

String.Equals(line, me.Key)

Have you seen this: What is the fastest (built-in) comparison for string-types in C# 你有没有看到这个: C#中字符串类型的最快(内置)比较是什么?

It's not clear exactly what you mean by "comparision" but if you don't mean "sort" ie you want to check for plagiarism or something, then what about hashing the lines first and comparing the hash? 目前还不清楚“比较”究竟是什么意思,但如果你不是指“排序”,即你想检查剽窃或其他什么,那么首先对行进行散列并比较散列呢?

It would depend on the size of your data set as to whether there is any benefit. 这取决于您的数据集的大小是否有任何好处。 Large and small are highly subjective terms. 大小都是高度主观的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM