简体   繁体   中英

Get the index of the nth occurrence of a string?

Unless I am missing an obvious built-in method, what is the quickest way to get the n th occurrence of a string within a string?

I realize that I could loop the IndexOf method by updating its start index on each iteration of the loop. But doing it this way seems wasteful to me.

You really could use the regular expression \/((s).*?){n}\/<\/code> to search for n-th occurrence of substring s<\/code> .

public static class StringExtender
{
    public static int NthIndexOf(this string target, string value, int n)
    {
        Match m = Regex.Match(target, "((" + Regex.Escape(value) + ").*?){" + n + "}");

        if (m.Success)
            return m.Groups[2].Captures[n - 1].Index;
        else
            return -1;
    }
}

That's basically what you need to do - or at least, it's the easiest solution. All you'd be "wasting" is the cost of n method invocations - you won't actually be checking any case twice, if you think about it. (IndexOf will return as soon as it finds the match, and you'll keep going from where it left off.)

"

That's basically what you need to do - or at least, it's the easiest solution. All you'd be "wasting" is the cost of n method invocations - you won't actually be checking any case twice, if you think about it. (IndexOf will return as soon as it finds the match, and you'll keep going from where it left off.)

Here is the recursive implementation (of the above idea ) as an extension method, mimicing the format of the framework method(s):

public static int IndexOfNth(this string input,
                             string value, int startIndex, int nth)
{
    if (nth < 1)
        throw new NotSupportedException("Param 'nth' must be greater than 0!");
    if (nth == 1)
        return input.IndexOf(value, startIndex);
    var idx = input.IndexOf(value, startIndex);
    if (idx == -1)
        return -1;
    return input.IndexOfNth(value, idx + 1, --nth);
}

Also, here are some (MBUnit) unit tests that might help you (to prove it is correct):

using System;
using MbUnit.Framework;

namespace IndexOfNthTest
{
    [TestFixture]
    public class Tests
    {
        //has 4 instances of the 
        private const string Input = "TestTest";
        private const string Token = "Test";

        /* Test for 0th index */

        [Test]
        public void TestZero()
        {
            Assert.Throws<NotSupportedException>(
                () => Input.IndexOfNth(Token, 0, 0));
        }

        /* Test the two standard cases (1st and 2nd) */

        [Test]
        public void TestFirst()
        {
            Assert.AreEqual(0, Input.IndexOfNth("Test", 0, 1));
        }

        [Test]
        public void TestSecond()
        {
            Assert.AreEqual(4, Input.IndexOfNth("Test", 0, 2));
        }

        /* Test the 'out of bounds' case */

        [Test]
        public void TestThird()
        {
            Assert.AreEqual(-1, Input.IndexOfNth("Test", 0, 3));
        }

        /* Test the offset case (in and out of bounds) */

        [Test]
        public void TestFirstWithOneOffset()
        {
            Assert.AreEqual(4, Input.IndexOfNth("Test", 4, 1));
        }

        [Test]
        public void TestFirstWithTwoOffsets()
        {
            Assert.AreEqual(-1, Input.IndexOfNth("Test", 8, 1));
        }
    }
}
private int IndexOfOccurence(string s, string match, int occurence)
{
    int i = 1;
    int index = 0;

    while (i <= occurence && (index = s.IndexOf(match, index + 1)) != -1)
    {
        if (i == occurence)
            return index;

        i++;
    }

    return -1;
}

or in C# with extension methods

public static int IndexOfOccurence(this string s, string match, int occurence)
{
    int i = 1;
    int index = 0;

    while (i <= occurence && (index = s.IndexOf(match, index + 1)) != -1)
    {
        if (i == occurence)
            return index;

        i++;
    }

    return -1;
}

After some benchmarking, this seems to be the simplest and most effcient solution

public static int IndexOfNthSB(string input,
             char value, int startIndex, int nth)
        {
            if (nth < 1)
                throw new NotSupportedException("Param 'nth' must be greater than 0!");
            var nResult = 0;
            for (int i = startIndex; i < input.Length; i++)
            {
                if (input[i] == value)
                    nResult++;
                if (nResult == nth)
                    return i;
            }
            return -1;
        }

也许使用String.Split()<\/code>方法并检查请求的事件是否在数组中也会很好,如果您不需要索引,但索引处的值

"

System.ValueTuple ftw:

var index = line.Select((x, i) => (x, i)).Where(x => x.Item1 == '"').ElementAt(5).Item2;

writing a function from that is homework

Tod's answer can be simplified somewhat.

using System;

static class MainClass {
    private static int IndexOfNth(this string target, string substring,
                                       int seqNr, int startIdx = 0)
    {
        if (seqNr < 1)
        {
            throw new IndexOutOfRangeException("Parameter 'nth' must be greater than 0.");
        }

        var idx = target.IndexOf(substring, startIdx);

        if (idx < 0 || seqNr == 1) { return idx; }

        return target.IndexOfNth(substring, --seqNr, ++idx); // skip
    }

    static void Main () {
        Console.WriteLine ("abcbcbcd".IndexOfNth("bc", 1));
        Console.WriteLine ("abcbcbcd".IndexOfNth("bc", 2));
        Console.WriteLine ("abcbcbcd".IndexOfNth("bc", 3));
        Console.WriteLine ("abcbcbcd".IndexOfNth("bc", 4));
    }
}

Or something like this with the do while loop

 private static int OrdinalIndexOf(string str, string substr, int n)
    {
        int pos = -1;
        do
        {
            pos = str.IndexOf(substr, pos + 1);
        } while (n-- > 0 && pos != -1);
        return pos;
    }

Here I go again: Another benchmark answer from yours truly :-) Once again based on the fantastic BenchmarkDotNet package (if you're serious about benchmarking do.net code, please, please use this package).

The motivation for this post is two fold: PeteT (who asked it originally) wondered that it seems wasteful to use String.IndexOf varying the startIndex parameter in a loop to find the nth occurrence of a character while, in fact, it's the fastest method, and because some answers uses regular expressions which are an order of magnitude slower (and do not add any benefits, in my opinion not even readability, in this specific case).

Here is the code I've ended up using in my string extensions library (it's not a new answer to this question, since others have already posted semantically identical code here, I'm not taking credit for it). This is the fastest method (even, possibly, including unsafe variations - more on that later):

public static int IndexOfNth(this string str, char ch, int nth, int startIndex = 0) {
    if (str == null)
        throw new ArgumentNullException("str");
    var idx = str.IndexOf(ch, startIndex);
    while (idx >= 0 && --nth > 0)
        idx = str.IndexOf(ch, startIndex + idx + 1);
    return idx;
}

I've benchmarked this code against two other methods and the results follow:

基准测试结果

The benchmarked methods were:

[Benchmark]
public int FindNthRegex() {
    Match m = Regex.Match(text, "((" + Regex.Escape("z") + ").*?){" + Nth + "}");
    return (m.Success)
        ? m.Groups[2].Captures[Nth - 1].Index
        : -1;
}
[Benchmark]
public int FindNthCharByChar() {
    var occurrence = 0;
    for (int i = 0; i < text.Length; i++) {
        if (text[i] == 'z')
            occurrence++;
        if (Nth == occurrence)
            return i;
    }
    return -1;
}
[Benchmark]
public int FindNthIndexOfStartIdx() {
    var idx = text.IndexOf('z', 0);
    var nth = Nth;
    while (idx >= 0 && --nth > 0)
        idx = text.IndexOf('z', idx + 1);
    return idx;
}

The FindNthRegex method is the slower of the bunch, taking an order (or two) of magnitude more time than the fastest. FindNthByChar loops over each char on the string and counts each match until it finds the nth occurrence. FindNthIndexOfStartIdx uses the method suggested by the opener of this question which, indeed, is the same I've been using for ages to accomplish this and it is the fastest of them all.

Why is it so much faster than FindNthByChar ? It's because Microsoft went to great lengths to make string manipulation as fast as possible in the do.net framework. And they've accomplished that: They did an amazing job! I've done a deeper investigation on string manipulations in do.net in an CodeProject article which tries to find the fastest method to remove all whitespace from a string:

Fastest method to remove all whitespace from Strings in .NET

There you'll find why string manipulations in do.net are so fast, and why it's next to useless trying to squeeze more speed by writing our own versions of the framework's string manipulation code (the likes of string.IndexOf , string.Split , string.Replace , etc.)

The full benchmark code I've used follows (it's a do.net6 console program):

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Text;
using System.Text.RegularExpressions;

var summary = BenchmarkRunner.Run<BenchmarkFindNthChar>();

public class BenchmarkFindNthChar
{
    const string BaseText = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";

    [Params(100, 1000)]
    public int BaseTextRepeatCount { get; set; }
    [Params(500)]
    public int Nth { get; set; }
    private string text;
    [GlobalSetup]
    public void BuildTestData() {
        var sb = new StringBuilder();
        for (int i = 0; i < BaseTextRepeatCount; i++)
            sb.AppendLine(BaseText);
        text = sb.ToString();
    }
    [Benchmark]
    public int FindNthRegex() {
        Match m = Regex.Match(text, "((" + Regex.Escape("z") + ").*?){" + Nth + "}");
        return (m.Success)
            ? m.Groups[2].Captures[Nth - 1].Index
            : -1;
    }
    [Benchmark]
    public int FindNthCharByChar() {
        var occurrence = 0;
        for (int i = 0; i < text.Length; i++) {
            if (text[i] == 'z')
                occurrence++;
            if (Nth == occurrence)
                return i;
        }
        return -1;
    }
    [Benchmark]
    public int FindNthIndexOfStartIdx() {
        var idx = text.IndexOf('z', 0);
        var nth = Nth;
        while (idx >= 0 && --nth > 0)
            idx = text.IndexOf('z', idx + 1);
        return idx;
    }

}

这可能会做到:

Console.WriteLine(str.IndexOf((@"\")+2)+1);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM