獲取字符串第 n 次出現的索引？

Question

除非我缺少明顯的內置方法，否則獲取字符串中第n次出現的字符串的最快方法是什么？

我意識到我可以通過在循環的每次迭代中更新其起始索引來循環IndexOf方法。 但是這樣做對我來說似乎很浪費。

Answer 1

您真的可以使用正則表達式\/((s).*?){n}\/<\/code>來搜索第 n 次出現的子字符串s<\/code> 。

在 C# 中，它可能看起來像這樣：

public static class StringExtender
{
    public static int NthIndexOf(this string target, string value, int n)
    {
        Match m = Regex.Match(target, "((" + Regex.Escape(value) + ").*?){" + n + "}");

        if (m.Success)
            return m.Groups[2].Captures[n - 1].Index;
        else
            return -1;
    }
}

Answer 2

這基本上就是你需要做的——或者至少，這是最簡單的解決方案。 您“浪費”的只是 n 次方法調用的成本 - 如果您考慮一下，您實際上不會檢查任何情況兩次。 （IndexOf 將在找到匹配項后立即返回，並且您將從中斷處繼續。）

"

Answer 3

這基本上就是你需要做的——或者至少，這是最簡單的解決方案。 您“浪費”的只是 n 次方法調用的成本 - 如果您考慮一下，您實際上不會檢查任何情況兩次。 （IndexOf 將在找到匹配項后立即返回，並且您將從中斷處繼續。）

這是遞歸實現（上述想法）作為擴展方法，模仿框架方法的格式：

public static int IndexOfNth(this string input,
                             string value, int startIndex, int nth)
{
    if (nth < 1)
        throw new NotSupportedException("Param 'nth' must be greater than 0!");
    if (nth == 1)
        return input.IndexOf(value, startIndex);
    var idx = input.IndexOf(value, startIndex);
    if (idx == -1)
        return -1;
    return input.IndexOfNth(value, idx + 1, --nth);
}

此外，這里有一些（MBUnit）單元測試可能對您有所幫助（證明它是正確的）：

using System;
using MbUnit.Framework;

namespace IndexOfNthTest
{
    [TestFixture]
    public class Tests
    {
        //has 4 instances of the 
        private const string Input = "TestTest";
        private const string Token = "Test";

        /* Test for 0th index */

        [Test]
        public void TestZero()
        {
            Assert.Throws<NotSupportedException>(
                () => Input.IndexOfNth(Token, 0, 0));
        }

        /* Test the two standard cases (1st and 2nd) */

        [Test]
        public void TestFirst()
        {
            Assert.AreEqual(0, Input.IndexOfNth("Test", 0, 1));
        }

        [Test]
        public void TestSecond()
        {
            Assert.AreEqual(4, Input.IndexOfNth("Test", 0, 2));
        }

        /* Test the 'out of bounds' case */

        [Test]
        public void TestThird()
        {
            Assert.AreEqual(-1, Input.IndexOfNth("Test", 0, 3));
        }

        /* Test the offset case (in and out of bounds) */

        [Test]
        public void TestFirstWithOneOffset()
        {
            Assert.AreEqual(4, Input.IndexOfNth("Test", 4, 1));
        }

        [Test]
        public void TestFirstWithTwoOffsets()
        {
            Assert.AreEqual(-1, Input.IndexOfNth("Test", 8, 1));
        }
    }
}

Answer 4

private int IndexOfOccurence(string s, string match, int occurence)
{
    int i = 1;
    int index = 0;

    while (i <= occurence && (index = s.IndexOf(match, index + 1)) != -1)
    {
        if (i == occurence)
            return index;

        i++;
    }

    return -1;
}

或在 C# 中使用擴展方法

public static int IndexOfOccurence(this string s, string match, int occurence)
{
    int i = 1;
    int index = 0;

    while (i <= occurence && (index = s.IndexOf(match, index + 1)) != -1)
    {
        if (i == occurence)
            return index;

        i++;
    }

    return -1;
}

Answer 5

經過一些基准測試，這似乎是最簡單和最有效的解決方案

public static int IndexOfNthSB(string input,
             char value, int startIndex, int nth)
        {
            if (nth < 1)
                throw new NotSupportedException("Param 'nth' must be greater than 0!");
            var nResult = 0;
            for (int i = startIndex; i < input.Length; i++)
            {
                if (input[i] == value)
                    nResult++;
                if (nResult == nth)
                    return i;
            }
            return -1;
        }

Answer 6

也許使用String.Split()<\/code>方法並檢查請求的事件是否在數組中也會很好，如果您不需要索引，但索引處的值

"

Answer 7

System.ValueTuple ftw：

var index = line.Select((x, i) => (x, i)).Where(x => x.Item1 == '"').ElementAt(5).Item2;

寫一個函數是作業

Answer 8

托德的回答可以稍微簡化。

using System;

static class MainClass {
    private static int IndexOfNth(this string target, string substring,
                                       int seqNr, int startIdx = 0)
    {
        if (seqNr < 1)
        {
            throw new IndexOutOfRangeException("Parameter 'nth' must be greater than 0.");
        }

        var idx = target.IndexOf(substring, startIdx);

        if (idx < 0 || seqNr == 1) { return idx; }

        return target.IndexOfNth(substring, --seqNr, ++idx); // skip
    }

    static void Main () {
        Console.WriteLine ("abcbcbcd".IndexOfNth("bc", 1));
        Console.WriteLine ("abcbcbcd".IndexOfNth("bc", 2));
        Console.WriteLine ("abcbcbcd".IndexOfNth("bc", 3));
        Console.WriteLine ("abcbcbcd".IndexOfNth("bc", 4));
    }
}

Answer 9

或者像這樣的 do while 循環

 private static int OrdinalIndexOf(string str, string substr, int n)
    {
        int pos = -1;
        do
        {
            pos = str.IndexOf(substr, pos + 1);
        } while (n-- > 0 && pos != -1);
        return pos;
    }

Answer 10

在這里，我再次 go：另一個真正來自您的基准答案 :-) 再次基於出色的BenchmarkDotNet package（如果您認真對待基准 do.net 代碼，請使用此包）。

這篇文章的動機有兩個：PeteT（最初提出這個問題的人）想知道在循環中使用String.IndexOf改變startIndex參數來查找字符的第 n 次出現似乎很浪費，而事實上，這是最快的方法，並且因為某些答案使用的正則表達式要慢一個數量級（並且在我看來，在這種特定情況下，甚至沒有增加可讀性）。

這是我最終在我的字符串擴展庫中使用的代碼（這不是這個問題的新答案，因為其他人已經在這里發布了語義相同的代碼，我不認為它是功勞）。 這是最快的方法（甚至可能包括不安全的變體——稍后會詳細介紹）：

public static int IndexOfNth(this string str, char ch, int nth, int startIndex = 0) {
    if (str == null)
        throw new ArgumentNullException("str");
    var idx = str.IndexOf(ch, startIndex);
    while (idx >= 0 && --nth > 0)
        idx = str.IndexOf(ch, startIndex + idx + 1);
    return idx;
}

我已經將此代碼與其他兩種方法進行了基准測試，結果如下：

基准測試方法是：

[Benchmark]
public int FindNthRegex() {
    Match m = Regex.Match(text, "((" + Regex.Escape("z") + ").*?){" + Nth + "}");
    return (m.Success)
        ? m.Groups[2].Captures[Nth - 1].Index
        : -1;
}
[Benchmark]
public int FindNthCharByChar() {
    var occurrence = 0;
    for (int i = 0; i < text.Length; i++) {
        if (text[i] == 'z')
            occurrence++;
        if (Nth == occurrence)
            return i;
    }
    return -1;
}
[Benchmark]
public int FindNthIndexOfStartIdx() {
    var idx = text.IndexOf('z', 0);
    var nth = Nth;
    while (idx >= 0 && --nth > 0)
        idx = text.IndexOf('z', idx + 1);
    return idx;
}

FindNthRegex方法是其中較慢的一個，比最快的方法多花費一個（或兩個）數量級的時間。 FindNthByChar字符串中的每個char並對每個匹配項進行計數，直到找到第 n 次出現。 FindNthIndexOfStartIdx使用這個問題的開場白所建議的方法，實際上，這與我多年來一直使用的方法相同，而且它是所有方法中最快的。

為什么它比FindNthByChar ？ 這是因為 Microsoft 竭盡全力在 do.net 框架中使字符串操作盡可能快。 他們已經做到了：他們做得非常出色！ 我在 CodeProject 文章中對 do.net 中的字符串操作進行了更深入的研究，該文章試圖找到從字符串中刪除所有空格的最快方法：

從 .NET 中的字符串中刪除所有空格的最快方法

在那里您會發現為什么 do.net 中的字符串操作如此之快，以及為什么通過編寫我們自己版本的框架字符串操作代碼（例如string.IndexOf 、 string.Split 、 string.Replace等）

我使用的完整基准測試代碼如下（它是一個 do.net6 控制台程序）：

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Text;
using System.Text.RegularExpressions;

var summary = BenchmarkRunner.Run<BenchmarkFindNthChar>();

public class BenchmarkFindNthChar
{
    const string BaseText = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";

    [Params(100, 1000)]
    public int BaseTextRepeatCount { get; set; }
    [Params(500)]
    public int Nth { get; set; }
    private string text;
    [GlobalSetup]
    public void BuildTestData() {
        var sb = new StringBuilder();
        for (int i = 0; i < BaseTextRepeatCount; i++)
            sb.AppendLine(BaseText);
        text = sb.ToString();
    }
    [Benchmark]
    public int FindNthRegex() {
        Match m = Regex.Match(text, "((" + Regex.Escape("z") + ").*?){" + Nth + "}");
        return (m.Success)
            ? m.Groups[2].Captures[Nth - 1].Index
            : -1;
    }
    [Benchmark]
    public int FindNthCharByChar() {
        var occurrence = 0;
        for (int i = 0; i < text.Length; i++) {
            if (text[i] == 'z')
                occurrence++;
            if (Nth == occurrence)
                return i;
        }
        return -1;
    }
    [Benchmark]
    public int FindNthIndexOfStartIdx() {
        var idx = text.IndexOf('z', 0);
        var nth = Nth;
        while (idx >= 0 && --nth > 0)
            idx = text.IndexOf('z', idx + 1);
        return idx;
    }

}

Answer 11

這可能會做到：

Console.WriteLine(str.IndexOf((@"\")+2)+1);

獲取字符串第 n 次出現的索引？

問題描述

11 個解決方案

解決方案1
110 2008-10-09 14:01:58

解決方案2
53 已采納 2008-10-09 10:26:51

解決方案3
20 2011-03-22 03:34:26

解決方案4
15 2009-10-07 04:39:44

解決方案5
2 2018-08-08 19:58:14

解決方案6
1 2014-05-13 09:08:10

解決方案7
1 2018-08-19 00:07:26

解決方案8
0 2019-08-16 15:39:55

解決方案9
0 2019-12-21 21:48:47

解決方案10
0 2022-12-30 18:30:34

解決方案11
-4 2012-07-08 07:05:54

獲取字符串第 n 次出現的索引？

問題描述

11 個解決方案

解決方案1 110 2008-10-09 14:01:58

解決方案2 53 已采納 2008-10-09 10:26:51

解決方案3 20 2011-03-22 03:34:26

解決方案4 15 2009-10-07 04:39:44

解決方案5 2 2018-08-08 19:58:14

解決方案6 1 2014-05-13 09:08:10

解決方案7 1 2018-08-19 00:07:26

解決方案8 0 2019-08-16 15:39:55

解決方案9 0 2019-12-21 21:48:47

解決方案10 0 2022-12-30 18:30:34

解決方案11 -4 2012-07-08 07:05:54

解決方案1
110 2008-10-09 14:01:58

解決方案2
53 已采納 2008-10-09 10:26:51

解決方案3
20 2011-03-22 03:34:26

解決方案4
15 2009-10-07 04:39:44

解決方案5
2 2018-08-08 19:58:14

解決方案6
1 2014-05-13 09:08:10

解決方案7
1 2018-08-19 00:07:26

解決方案8
0 2019-08-16 15:39:55

解決方案9
0 2019-12-21 21:48:47

解決方案10
0 2022-12-30 18:30:34

解決方案11
-4 2012-07-08 07:05:54