简体   繁体   中英

Get Index of First non-Whitespace Character in C# String

Is there a means to get the index of the first non-whitespace character in a string (or more generally, the index of the first character matching a condition) in C# without writing my own looping code?

EDIT

By "writing my own looping code", I really meant that I'm looking for a compact expression that solves the problem without cluttering the logic I'm working on.

I apologize for any confusion on that point.

string当然是IEnumerable<char>所以你可以使用Linq:

int offset = someString.TakeWhile(c => char.IsWhiteSpace(c)).Count();

I like to define my own extension method for returning the index of the first element that satisfies a custom predicate in a sequence.

/// <summary>
/// Returns the index of the first element in the sequence 
/// that satisfies a condition.
/// </summary>
/// <typeparam name="TSource">
/// The type of the elements of <paramref name="source"/>.
/// </typeparam>
/// <param name="source">
/// An <see cref="IEnumerable{T}"/> that contains
/// the elements to apply the predicate to.
/// </param>
/// <param name="predicate">
/// A function to test each element for a condition.
/// </param>
/// <returns>
/// The zero-based index position of the first element of <paramref name="source"/>
/// for which <paramref name="predicate"/> returns <see langword="true"/>;
/// or -1 if <paramref name="source"/> is empty
/// or no element satisfies the condition.
/// </returns>
public static int IndexOf<TSource>(this IEnumerable<TSource> source, 
    Func<TSource, bool> predicate)
{
    int i = 0;

    foreach (TSource element in source)
    {
        if (predicate(element))
            return i;

        i++;
    }

    return -1;
}

You could then use LINQ to address your original problem:

string str = "   Hello World";
int i = str.IndexOf<char>(c => !char.IsWhiteSpace(c));
string s= "   \t  Test";
Array.FindIndex(s.ToCharArray(), x => !char.IsWhiteSpace(x));

returns 6

To add a condition just do ...

Array.FindIndex(s.ToCharArray(), x => !char.IsWhiteSpace(x) && your condition);

You can use the String.IndexOfAny function which returns the first occurrence any character in a specified array of Unicode characters.

Alternatively, you can use the String.TrimStart function which remove all white space characters from the beginning of the string. The index of the first non-white space character is the difference between the length of the original string and the trimmed one.

You can even pick a set of characters to trim :)

Basically, if you are looking for a limited set of chars (let's say digits) you should go with the first method.

If you are trying to ignore a limited set of characters (like white spaces) you should go with the second method.

A Last method would be to use the Linq methods:

string s = "        qsdmlkqmlsdkm";
Console.WriteLine(s.TrimStart());
Console.WriteLine(s.Length - s.TrimStart().Length);
Console.WriteLine(s.FirstOrDefault(c => !Char.IsWhiteSpace(c)));
Console.WriteLine(s.IndexOf(s.FirstOrDefault(c => !Char.IsWhiteSpace(c))));

Output:

qsdmlkqmlsdkm
8
q
8
var match = Regex.Match(" \t test  ", @"\S"); // \S means all characters that are not whitespace
if (match.Success)
{
    int index = match.Index;
    //do something with index
}
else
{
    //there were no non-whitespace characters, handle appropriately
}

If you'll be doing this often, for performance reasons you should cache the compiled Regex for this pattern, eg:

static readonly Regex nonWhitespace = new Regex(@"\S");

Then use it like:

nonWhitespace.Match(" \t test  ");

Since there were several solutions here I decided to do some performance tests to see how each performs. Decided to share these results for those interested...

    int iterations = 1000000;
    int result = 0;
    string s= "   \t  Test";

    System.Diagnostics.Stopwatch watch = new Stopwatch();

    // Convert to char array and use FindIndex
    watch.Start();
    for (int i = 0; i < iterations; i++)
        result = Array.FindIndex(s.ToCharArray(), x => !char.IsWhiteSpace(x)); 
    watch.Stop();
    Console.WriteLine("Convert to char array and use FindIndex: " + watch.ElapsedMilliseconds);

    // Trim spaces and get index of first character
    watch.Restart();
    for (int i = 0; i < iterations; i++)
        result = s.IndexOf(s.TrimStart().Substring(0,1));
    watch.Stop();
    Console.WriteLine("Trim spaces and get index of first character: " + watch.ElapsedMilliseconds);

    // Use extension method
    watch.Restart();
    for (int i = 0; i < iterations; i++)
        result = s.IndexOf<char>(c => !char.IsWhiteSpace(c));
    watch.Stop();
    Console.WriteLine("Use extension method: " + watch.ElapsedMilliseconds);

    // Loop
    watch.Restart();
    for (int i = 0; i < iterations; i++)
    {   
        result = 0;
        foreach (char c in s)
        {
            if (!char.IsWhiteSpace(c))
                break;
            result++;
        }
    }
    watch.Stop();
    Console.WriteLine("Loop: " + watch.ElapsedMilliseconds);

Results are in milliseconds....

Where s = " \\t Test"
Convert to char array and use FindIndex: 154
Trim spaces and get index of first character: 189
Use extension method: 234
Loop: 146

Where s = "Test"
Convert to char array and use FindIndex: 39
Trim spaces and get index of first character: 155
Use extension method: 57
Loop: 15

Where s = (1000 character string with no spaces)
Convert to char array and use FindIndex: 506
Trim spaces and get index of first character: 534
Use extension method: 51
Loop: 15

Where s = (1000 character string that starts with " \\t Test")
Convert to char array and use FindIndex: 609
Trim spaces and get index of first character: 1103
Use extension method: 226
Loop: 146

Draw your own conclusions but my conclusion is to use whichever one you like best because the performance differences is insignificant in real world scenerios.

Inspired by this solution of trimming the string , but much more efficient by using ReadOnlySpan :

string s = "   xyz";
int index = s.Length - s.AsSpan().TrimStart().Length;
// index is 3

Neither .AsSpan() nor .TrimStart() create copies of the string, they just store a reference to a string character and a length.

  • .AsSpan() is an extension method of String that creates a span pointing to the first character of the string. Its length is the total string length.
  • .TrimStart() is an extension method of ReadOnlySpan<char> that creates a span pointing to the first non-whitespace character. Its length is the total string length minus the position of the first non-whitespace character.

This pattern can be used in general to skip over any list of given characters:

string s = "foobar";
int index = s.Length - s.AsSpan().TrimStart("fo").Length;
// index is 3

I did a benchmark of this method and several others from this Q&A, using BenchmarkDotNet (my benchmark code ):

Method

Mean

Error

StdDev

Regex_Compiled 45.05 us 0.043 us 0.034 us
ReadOnlySpan_Trim (this answer) 50.24 us 0.073 us 0.061 us
String_Trim 94.64 us 0.458 us 0.428 us
Regex_Interpreted 114.41 us 0.224 us 0.210 us
Regex_StaticMethod (read below!) 114.19 us 0.056 us 0.046 us
FirstNonMatch 150.58 us 0.214 us 0.190 us
Array_FindIndex 200.40 us 1.951 us 1.730 us
StringExt_IndexOfPredicate 336.31 us 0.896 us 0.838 us
Linq_TakeWhile 490.97 us 0.994 us 0.930 us

I didn't expect that RegEx_Compiled would be fastest. Actually RegEx_StaticMethod should perform equally as RegEx_Compiled (because the static Regex methods cache compiled patterns), but as BenchmarkDotNet creates a new process per test run , that cache doesn't have any effect.

The String_Trim benchmark depends on how many characters follow after the first non-whitespace character, because it copies the substring. For short texts, performance could be close to ReadOnlySpan_Trim , but for longer texts performance will be much worse. The input text of this benchmark contains 50k non-whitespace characters, so there is already a significant difference.

您可以修剪,获取第一个字符并使用IndexOf。

There is a very simple solution

string test = "    hello world";
int pos = test.ToList<char>().FindIndex(x => char.IsWhiteSpace(x) == false);

pos will be 4

you can have more complex conditions like:

pos = test.ToList<char>().FindIndex((x) =>
                {
                    if (x == 's') //Your complex conditions go here
                        return true;
                    else 
                        return false;
                }
            );

Yes you can try this:

string stg = "   xyz";
int indx = (stg.Length - stg.Trim().Length);  

Something is going to be looping somewhere. For full control over what is and isn't whitespace you could use linq to objects to do your loop:

int index = Array.FindIndex(
               s.ToCharArray(), 
               x => !(new [] { '\t', '\r', '\n', ' '}.Any(c => c == x)));

There are a lot of solutions here that convert the string to an array. That is not necessary, individual characters in a string can be accessed just as items in an array.

This is my solution that should be very efficient:

private static int FirstNonMatch(string s, Func<char, bool> predicate, int startPosition = 0)
{
    for (var i = startPosition; i < s.Length; i++)
        if (!predicate(s[i])) return i;

    return -1;
}

private static int LastNonMatch(string s, Func<char, bool> predicate, int startPosition)
{
    for (var i = startPosition; i >= 0; i--)
        if (!predicate(s[i])) return i;

    return -1;
}

And to use these, do the following:

var x = FirstNonMatch(" asdf ", char.IsWhiteSpace);
var y = LastNonMatch(" asdf ", char.IsWhiteSpace, " asdf ".Length);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM