简体   繁体   English

StackOverflowException在非无限的递归字符串搜索中

[英]StackOverflowException in non-infinite, recursive string search

Background. 背景。 My script encounters a StackOverflowException while recursively searching for specific text in a large string. 我的脚本遇到StackOverflowException,同时递归搜索大字符串中的特定文本。 The loop is not infinite; 循环不是无限的; the problem occurs (for a specific search) between 9,000-10,000 legitimate searches -- I need it to keep going. 问题发生在9,000-10,000次合法搜索之间(对于特定搜索) - 我需要它继续前进。 I'm using tail-recursion (I think) and that may be part of my problem, since I gather that C# does not do this well. 我正在使用尾递归(我认为),这可能是我的问题的一部分,因为我认为C#不能很好地做到这一点。 However, I'm not sure how to avoid using tail-recursion in my case. 但是,我不确定如何避免在我的情况下使用尾递归。

Question(s). 问题(S)。 Why is the StackOverflowException occurring? 为什么发生StackOverflowException? Does my overall approach make sense? 我的整体方法是否有意义? If the design sucks, I'd rather start there, rather than just avoiding an exception. 如果设计很糟糕,我宁愿从那里开始,而不仅仅是避免异常。 But if the design is acceptable, what can I do about the StackOverflowException? 但如果设计可以接受,我该怎么办StackOverflowException呢?

Code. 码。 The class I've written searches for contacts (about 500+ from a specified list) in a large amount of text (about 6MB). 我编写的课程在大量文本(大约6MB)中搜索联系人(大约500+来自指定列表)。 The strategy I'm using is to search for the last name, then look for the first name somewhere shortly before or after the last name. 我正在使用的策略是搜索姓氏,然后在姓氏之前或之后的某个地方查找名字。 I need to find each instance of each contact within the given text. 我需要找到给定文本中每个联系人的每个实例。 The StringSearcher class has a recursive method that continues to search for contacts, returning the result whenever one is found, but keeping track of where it left off with the search. StringSearcher类有一个递归方法,它继续搜索联系人,每当找到一个联系人时返回结果,但跟踪搜索中断的位置。

I use this class in the following manner: 我以下列方式使用此类:

StringSearcher searcher = new StringSearcher(
    File.ReadAllText(FilePath),
    "lastname",
    "firstname",
    30
);

string searchResult = null;
while ((searchResult = searcher.NextInstance()) != null)
{
    // do something with each searchResult
}

On the whole, the script seems to work. 总的来说,脚本似乎有效。 Most contacts return the results I expect. 大多数联系人返回我期望的结果。 However, The problem seems to occur when the primary search string is extremely common (thousands of hits), and the secondary search string never or rarely occurs. 但是,当主搜索字符串非常常见(数千次点击),并且次要搜索字符串从未或很少发生时,似乎会出现问题。 I know it's not getting stuck because the CurrentIndex is advancing normally. 我知道它不会卡住,因为CurrentIndex正在正常推进。

Here's the recursive method I'm talking about. 这是我正在谈论的递归方法。

public string NextInstance()
{
    // Advance this.CurrentIndex to the next location of the primary search string
    this.SearchForNext();

    // Look a little before and after the primary search string
    this.CurrentContext = this.GetContextAtCurrentIndex();

    // Primary search string found?
    if (this.AnotherInstanceFound)
    {
        // If there is a valid secondary search string, is that found near the
        // primary search string? If not, look for the next instance of the primary
        // search string
        if (!string.IsNullOrEmpty(this.SecondarySearchString) &&
            !this.IsSecondaryFoundInContext())
        {
            return this.NextInstance();
        }
        // 
        else
        {
            return this.CurrentContext;
        }
    }
    // No more instances of the primary search string
    else
    {
        return null;
    }
}

The StackOverflowException occurs on this.CurrentIndex = ... in the following method: StackOverflowException发生在this.CurrentIndex = ...中,方法如下:

private void SearchForNext()
{
    // If we've already searched once, 
    // increment the current index before searching further.
    if (0 != this.CurrentIndex)
    {
        this.CurrentIndex++;
        this.NumberOfSearches++;
    }

    this.CurrentIndex = this.Source.IndexOf(
            this.PrimarySearchString,
            ValidIndex(this.CurrentIndex),
            StringComparison.OrdinalIgnoreCase
    );

    this.AnotherInstanceFound = !(this.CurrentIndex >= 0) ? false : true;
}

I can include more code if needed. 如果需要,我可以包含更多代码。 Let me know if one of those methods or variables are questionable. 如果其中一种方法或变量值得怀疑,请告诉我。

*Performance is not really a concern because this will likely run at night as a scheduled task. *性能并不是真正的问题,因为这可能会在晚上作为计划任务运行。

You have a 1MB stack. 你有一个1MB的堆栈。 When that stack space runs out and you still need more stack space a StackOverflowException is thrown. 当该堆栈空间用完并且您仍需要更多堆栈空间时,将抛出StackOverflowException This may or may not be a result of infinite recursion, the runtime has no idea. 这可能是也可能不是无限递归的结果,运行时不知道。 Infinite recursion is simply one effective way of using more stack space then is available (by using an infinite amount). 无限递归只是使用更多堆栈空间的一种有效方式,然后可用(通过使用无限量)。 You can be using a finite amount that just so happens to be more than is available and you'll get the same exception. 你可以使用一个有限的数量,这恰好比现有的更多,你会得到相同的例外。

While there are other ways to use up lots of stack space, recursion is one of the most effective. 虽然还有其他方法可以占用大量的堆栈空间,但递归是最有效的方法之一。 Each method is adding more space based on the signature and locals of that method. 每种方法都根据该方法的签名和本地添加更多空间。 Having deep recursion can use a lot of stack space, so if you expect to have a depth of more than a few hundred levels (and even that is a lot) you should probably not use recursion. 深度递归可以使用大量的堆栈空间,因此如果您希望深度超过几百个级别(甚至那么多),您可能不应该使用递归。 Note that any code using recursion can be written iterativly, or to use an explicit Stack . 请注意,任何使用递归的代码都可以迭代编写,或使用显式Stack

It's hard to say, as a complete implementation isn't shown, but based on what I can see you are more or less writing an iterator, but you're not using the C# constructs for one (namely IEnumerable ). 很难说,因为没有显示完整的实现,但基于我可以看到你或多或少编写迭代器,但你没有使用C#构造(即IEnumerable )。

My guess is "iterator blocks" will allow you to make this algorithm both easier to write, easier to write non-recursively, and more effective from the caller's side. 我的猜测是“迭代器块”将允许您使这个算法更容易编写,更容易编写非递归,并且更有效地从调用者方。

Here is a high level look at how you might structure this method as an iterator block: 以下是如何将此方法构造为迭代器块的高级视图:

public static IEnumerable<string> SearchString(string text
    , string firstString, string secondString, int unknown)
{
    int lastIndexFound = text.IndexOf(firstString);

    while (lastIndexFound >= 0)
    {
        if (secondStringNearFirst(text, firstString, secondString, lastIndexFound))
        {
            yield return lastIndexFound.ToString();
        }
    }
}

private static bool secondStringNearFirst(string text
    , string firstString, string secondString, int lastIndexFound)
{
    throw new NotImplementedException();
}

It doesn't seem like recursion is the right solution here. 这似乎不是递归是正确的解决方案。 Normally with recursive problems you have some state you pass to the recursive step. 通常,对于递归问题,您会将某些状态传递给递归步骤。 In this case, you really have a plain while loop. 在这种情况下,你真的有一个简单的while循环。 Below I put your method body in a loop and changed the recursive step to continue . 下面我将你的方法体放在一个循环中,并改变递归步骤continue See if that works... 看看是否有效......

public string NextInstance()
{
    while (true)
    {
        // Advance this.CurrentIndex to the next location of the primary search string
        this.SearchForNext();

        // Look a little before and after the primary search string
        this.CurrentContext = this.GetContextAtCurrentIndex();

        // Primary search string found?
        if (this.AnotherInstanceFound)
        {
            // If there is a valid secondary search string, is that found near the
            // primary search string? If not, look for the next instance of the primary
            // search string
            if (!string.IsNullOrEmpty(this.SecondarySearchString) &&
                !this.IsSecondaryFoundInContext())
            {
                continue; // Start searching again...
            }
            // 
            else
            {
                return this.CurrentContext;
            }
        }
        // No more instances of the primary search string
        else
        {
            return null;
        }
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM