简体   繁体   中英

Most efficient way to determine first character in string?

Which of these methods are the most efficient one or is there a better way to do it?

this.returnList[i].Title[0].ToString()

or

this.returnList[i].Title.Substring(0, 1)

They're both very fast:

Char Index

var sample = "sample";
var clock = new Stopwatch();
for (var i = 0; i < 10; i++)
{
    clock.Start();
    for (var j = 0; j < 10000000; j++)
    {
        var first = sample[0].ToString();
    }
    clock.Stop();
    Console.Write(clock.Elapsed);
    clock.Reset();
}

// Results
00:00:00.2012243
00:00:00.2207168
00:00:00.2184807
00:00:00.2258847
00:00:00.2296456
00:00:00.2261465
00:00:00.2120131
00:00:00.2221702
00:00:00.2346083
00:00:00.2330840

Substring

var sample = "sample";
var clock = new Stopwatch();
for (var i = 0; i < 10; i++)
{
    clock.Start();
    for (var j = 0; j < 10000000; j++)
    {
        var first = sample.Substring(0, 1);
    }
    clock.Stop();
    Console.Write(clock.Elapsed);
    clock.Reset();
}

// Results
00:00:00.3268155
00:00:00.3337077
00:00:00.3439908
00:00:00.3273090
00:00:00.3380794
00:00:00.3400650
00:00:00.3280275
00:00:00.3333719
00:00:00.3295982
00:00:00.3368425

I also agree with BrokenGlass that using the char index is a cleaner way of writing it. Plus if you're doing it 10 trillion times it'll be much faster!

There is a big loophole in your code that may cause problems, depending on what you mean by "first character" and what returnList contains.

C# strings contain UTF-16, which is a variable-length encoding, and if returnList is an array of strings, then returnList[i] might only be one char of a Unicode point. If you want to return the first Unicode grapheme of a string:

string s = returnList[i].Title;
if (string.IsNullOrEmpty(s))
    return s;

int charsInGlyph = char.IsSurrogatePair(s, 0) ? 2 : 1;
return s.Substring(0, charsInGlyph);

You can run into the same problems with BOMs, tagged, and combining characters; these are all valid characters but are not meaningful if displayed to a user.

If you want Unicode points or graphemes, not chars, you must use strings; Unicode graphemes can be more than one char.

I don't think it would matter much efficiency wise, but in my opinion the clearer, more idiomatic and hence more maintainable way of returning the first character is using the index operator:

char c = returnList[i].Title[0];

This assumes of course there is at least one character, if that's not a given you have to check for that.

Those should be close to identical in performance.

The expensive part of the operation is to create the string, and there is no more efficient way to do that.

Unless of couse you want to pre-create strings for all possible characters and store in a dictionary, but that would use up a lot of memory for such a trivial task.

returnList[I].Title[0] is much faster as it does not need to create a new string , only accessing a char from the original one. Of course, it will throw an exception if the string is empty, so you should check that first.

As a rule of thumb, never use strings with a fixed length of 1. that's what char is for.

The performance difference is not likely to matter though, but the better readability certainly will.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM