简体   繁体   English

获取字符串第 n 次出现的索引?

[英]Get the index of the nth occurrence of a string?

Unless I am missing an obvious built-in method, what is the quickest way to get the n th occurrence of a string within a string?除非我缺少明显的内置方法,否则获取字符串中第n次出现的字符串的最快方法是什么?

I realize that I could loop the IndexOf method by updating its start index on each iteration of the loop.我意识到我可以通过在循环的每次迭代中更新其起始索引来循环IndexOf方法。 But doing it this way seems wasteful to me.但是这样做对我来说似乎很浪费。

You really could use the regular expression \/((s).*?){n}\/<\/code> to search for n-th occurrence of substring s<\/code> .您真的可以使用正则表达式\/((s).*?){n}\/<\/code>来搜索第 n 次出现的子字符串s<\/code> 。

In C# it might look like this:在 C# 中,它可能看起来像这样:

public static class StringExtender
{
    public static int NthIndexOf(this string target, string value, int n)
    {
        Match m = Regex.Match(target, "((" + Regex.Escape(value) + ").*?){" + n + "}");

        if (m.Success)
            return m.Groups[2].Captures[n - 1].Index;
        else
            return -1;
    }
}

That's basically what you need to do - or at least, it's the easiest solution.这基本上就是你需要做的——或者至少,这是最简单的解决方案。 All you'd be "wasting" is the cost of n method invocations - you won't actually be checking any case twice, if you think about it.您“浪费”的只是 n 次方法调用的成本 - 如果您考虑一下,您实际上不会检查任何情况两次。 (IndexOf will return as soon as it finds the match, and you'll keep going from where it left off.) (IndexOf 将在找到匹配项后立即返回,并且您将从中断处继续。)

"

That's basically what you need to do - or at least, it's the easiest solution.这基本上就是你需要做的——或者至少,这是最简单的解决方案。 All you'd be "wasting" is the cost of n method invocations - you won't actually be checking any case twice, if you think about it.您“浪费”的只是 n 次方法调用的成本 - 如果您考虑一下,您实际上不会检查任何情况两次。 (IndexOf will return as soon as it finds the match, and you'll keep going from where it left off.) (IndexOf 将在找到匹配项后立即返回,并且您将从中断处继续。)

Here is the recursive implementation (of the above idea ) as an extension method, mimicing the format of the framework method(s):这是递归实现(上述想法)作为扩展方法,模仿框架方法的格式:

public static int IndexOfNth(this string input,
                             string value, int startIndex, int nth)
{
    if (nth < 1)
        throw new NotSupportedException("Param 'nth' must be greater than 0!");
    if (nth == 1)
        return input.IndexOf(value, startIndex);
    var idx = input.IndexOf(value, startIndex);
    if (idx == -1)
        return -1;
    return input.IndexOfNth(value, idx + 1, --nth);
}

Also, here are some (MBUnit) unit tests that might help you (to prove it is correct):此外,这里有一些(MBUnit)单元测试可能对您有所帮助(证明它是正确的):

using System;
using MbUnit.Framework;

namespace IndexOfNthTest
{
    [TestFixture]
    public class Tests
    {
        //has 4 instances of the 
        private const string Input = "TestTest";
        private const string Token = "Test";

        /* Test for 0th index */

        [Test]
        public void TestZero()
        {
            Assert.Throws<NotSupportedException>(
                () => Input.IndexOfNth(Token, 0, 0));
        }

        /* Test the two standard cases (1st and 2nd) */

        [Test]
        public void TestFirst()
        {
            Assert.AreEqual(0, Input.IndexOfNth("Test", 0, 1));
        }

        [Test]
        public void TestSecond()
        {
            Assert.AreEqual(4, Input.IndexOfNth("Test", 0, 2));
        }

        /* Test the 'out of bounds' case */

        [Test]
        public void TestThird()
        {
            Assert.AreEqual(-1, Input.IndexOfNth("Test", 0, 3));
        }

        /* Test the offset case (in and out of bounds) */

        [Test]
        public void TestFirstWithOneOffset()
        {
            Assert.AreEqual(4, Input.IndexOfNth("Test", 4, 1));
        }

        [Test]
        public void TestFirstWithTwoOffsets()
        {
            Assert.AreEqual(-1, Input.IndexOfNth("Test", 8, 1));
        }
    }
}
private int IndexOfOccurence(string s, string match, int occurence)
{
    int i = 1;
    int index = 0;

    while (i <= occurence && (index = s.IndexOf(match, index + 1)) != -1)
    {
        if (i == occurence)
            return index;

        i++;
    }

    return -1;
}

or in C# with extension methods或在 C# 中使用扩展方法

public static int IndexOfOccurence(this string s, string match, int occurence)
{
    int i = 1;
    int index = 0;

    while (i <= occurence && (index = s.IndexOf(match, index + 1)) != -1)
    {
        if (i == occurence)
            return index;

        i++;
    }

    return -1;
}

After some benchmarking, this seems to be the simplest and most effcient solution经过一些基准测试,这似乎是最简单和最有效的解决方案

public static int IndexOfNthSB(string input,
             char value, int startIndex, int nth)
        {
            if (nth < 1)
                throw new NotSupportedException("Param 'nth' must be greater than 0!");
            var nResult = 0;
            for (int i = startIndex; i < input.Length; i++)
            {
                if (input[i] == value)
                    nResult++;
                if (nResult == nth)
                    return i;
            }
            return -1;
        }

也许使用String.Split()<\/code>方法并检查请求的事件是否在数组中也会很好,如果您不需要索引,但索引处的值

"

System.ValueTuple ftw: System.ValueTuple ftw:

var index = line.Select((x, i) => (x, i)).Where(x => x.Item1 == '"').ElementAt(5).Item2;

writing a function from that is homework写一个函数是作业

Tod's answer can be simplified somewhat.托德的回答可以稍微简化。

using System;

static class MainClass {
    private static int IndexOfNth(this string target, string substring,
                                       int seqNr, int startIdx = 0)
    {
        if (seqNr < 1)
        {
            throw new IndexOutOfRangeException("Parameter 'nth' must be greater than 0.");
        }

        var idx = target.IndexOf(substring, startIdx);

        if (idx < 0 || seqNr == 1) { return idx; }

        return target.IndexOfNth(substring, --seqNr, ++idx); // skip
    }

    static void Main () {
        Console.WriteLine ("abcbcbcd".IndexOfNth("bc", 1));
        Console.WriteLine ("abcbcbcd".IndexOfNth("bc", 2));
        Console.WriteLine ("abcbcbcd".IndexOfNth("bc", 3));
        Console.WriteLine ("abcbcbcd".IndexOfNth("bc", 4));
    }
}

Or something like this with the do while loop或者像这样的 do while 循环

 private static int OrdinalIndexOf(string str, string substr, int n)
    {
        int pos = -1;
        do
        {
            pos = str.IndexOf(substr, pos + 1);
        } while (n-- > 0 && pos != -1);
        return pos;
    }

Here I go again: Another benchmark answer from yours truly :-) Once again based on the fantastic BenchmarkDotNet package (if you're serious about benchmarking do.net code, please, please use this package).在这里,我再次 go:另一个真正来自您的基准答案 :-) 再次基于出色的BenchmarkDotNet package(如果您认真对待基准 do.net 代码,请使用此包)。

The motivation for this post is two fold: PeteT (who asked it originally) wondered that it seems wasteful to use String.IndexOf varying the startIndex parameter in a loop to find the nth occurrence of a character while, in fact, it's the fastest method, and because some answers uses regular expressions which are an order of magnitude slower (and do not add any benefits, in my opinion not even readability, in this specific case).这篇文章的动机有两个:PeteT(最初提出这个问题的人)想知道在循环中使用String.IndexOf改变startIndex参数来查找字符的第 n 次出现似乎很浪费,而事实上,这是最快的方法,并且因为某些答案使用的正则表达式要慢一个数量级(并且在我看来,在这种特定情况下,甚至没有增加可读性)。

Here is the code I've ended up using in my string extensions library (it's not a new answer to this question, since others have already posted semantically identical code here, I'm not taking credit for it).这是我最终在我的字符串扩展库中使用的代码(这不是这个问题的新答案,因为其他人已经在这里发布了语义相同的代码,我不认为它是功劳)。 This is the fastest method (even, possibly, including unsafe variations - more on that later):这是最快的方法(甚至可能包括不安全的变体——稍后会详细介绍):

public static int IndexOfNth(this string str, char ch, int nth, int startIndex = 0) {
    if (str == null)
        throw new ArgumentNullException("str");
    var idx = str.IndexOf(ch, startIndex);
    while (idx >= 0 && --nth > 0)
        idx = str.IndexOf(ch, startIndex + idx + 1);
    return idx;
}

I've benchmarked this code against two other methods and the results follow:我已经将此代码与其他两种方法进行了基准测试,结果如下:

基准测试结果

The benchmarked methods were:基准测试方法是:

[Benchmark]
public int FindNthRegex() {
    Match m = Regex.Match(text, "((" + Regex.Escape("z") + ").*?){" + Nth + "}");
    return (m.Success)
        ? m.Groups[2].Captures[Nth - 1].Index
        : -1;
}
[Benchmark]
public int FindNthCharByChar() {
    var occurrence = 0;
    for (int i = 0; i < text.Length; i++) {
        if (text[i] == 'z')
            occurrence++;
        if (Nth == occurrence)
            return i;
    }
    return -1;
}
[Benchmark]
public int FindNthIndexOfStartIdx() {
    var idx = text.IndexOf('z', 0);
    var nth = Nth;
    while (idx >= 0 && --nth > 0)
        idx = text.IndexOf('z', idx + 1);
    return idx;
}

The FindNthRegex method is the slower of the bunch, taking an order (or two) of magnitude more time than the fastest. FindNthRegex方法是其中较慢的一个,比最快的方法多花费一个(或两个)数量级的时间。 FindNthByChar loops over each char on the string and counts each match until it finds the nth occurrence. FindNthByChar字符串中的每个char并对每个匹配项进行计数,直到找到第 n 次出现。 FindNthIndexOfStartIdx uses the method suggested by the opener of this question which, indeed, is the same I've been using for ages to accomplish this and it is the fastest of them all. FindNthIndexOfStartIdx使用这个问题的开场白所建议的方法,实际上,这与我多年来一直使用的方法相同,而且它是所有方法中最快的。

Why is it so much faster than FindNthByChar ?为什么它比FindNthByChar It's because Microsoft went to great lengths to make string manipulation as fast as possible in the do.net framework.这是因为 Microsoft 竭尽全力在 do.net 框架中使字符串操作尽可能快。 And they've accomplished that: They did an amazing job!他们已经做到了:他们做得非常出色! I've done a deeper investigation on string manipulations in do.net in an CodeProject article which tries to find the fastest method to remove all whitespace from a string:我在 CodeProject 文章中对 do.net 中的字符串操作进行了更深入的研究,该文章试图找到从字符串中删除所有空格的最快方法:

Fastest method to remove all whitespace from Strings in .NET 从 .NET 中的字符串中删除所有空格的最快方法

There you'll find why string manipulations in do.net are so fast, and why it's next to useless trying to squeeze more speed by writing our own versions of the framework's string manipulation code (the likes of string.IndexOf , string.Split , string.Replace , etc.)在那里您会发现为什么 do.net 中的字符串操作如此之快,以及为什么通过编写我们自己版本的框架字符串操作代码(例如string.IndexOfstring.Splitstring.Replace等)

The full benchmark code I've used follows (it's a do.net6 console program):我使用的完整基准测试代码如下(它是一个 do.net6 控制台程序):

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Text;
using System.Text.RegularExpressions;

var summary = BenchmarkRunner.Run<BenchmarkFindNthChar>();

public class BenchmarkFindNthChar
{
    const string BaseText = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";

    [Params(100, 1000)]
    public int BaseTextRepeatCount { get; set; }
    [Params(500)]
    public int Nth { get; set; }
    private string text;
    [GlobalSetup]
    public void BuildTestData() {
        var sb = new StringBuilder();
        for (int i = 0; i < BaseTextRepeatCount; i++)
            sb.AppendLine(BaseText);
        text = sb.ToString();
    }
    [Benchmark]
    public int FindNthRegex() {
        Match m = Regex.Match(text, "((" + Regex.Escape("z") + ").*?){" + Nth + "}");
        return (m.Success)
            ? m.Groups[2].Captures[Nth - 1].Index
            : -1;
    }
    [Benchmark]
    public int FindNthCharByChar() {
        var occurrence = 0;
        for (int i = 0; i < text.Length; i++) {
            if (text[i] == 'z')
                occurrence++;
            if (Nth == occurrence)
                return i;
        }
        return -1;
    }
    [Benchmark]
    public int FindNthIndexOfStartIdx() {
        var idx = text.IndexOf('z', 0);
        var nth = Nth;
        while (idx >= 0 && --nth > 0)
            idx = text.IndexOf('z', idx + 1);
        return idx;
    }

}

这可能会做到:

Console.WriteLine(str.IndexOf((@"\")+2)+1);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM