简体   繁体   English

如何使用C++找到第一个最长的连续等于substring?

[英]How to find the first longest consecutive equal substring using C++?

How to find the first longest consecutive equal substring using C++?如何使用C++找到第一个最长的连续等于substring?

The program asks the user to input the string to evaluate, then outputs the longest consecutive equal substring.该程序要求用户输入要评估的字符串,然后输出最长的连续等于 substring。


Sample input 1:示例输入 1:

String to evaluate: abcabc要评估的字符串:abcabc

Sample output 1:样品 output 1:

Longest Consecutive Equal Substring Found: abc最长连续等于 Substring 找到:abc


Sample input 2:示例输入 2:

String to evaluate: HeyThereHeyThere123JJJ123JJJ要评估的字符串:HeyThereHeyThere123JJJ123JJJ

Sample output 2:样本 output 2:

Longest Consecutive Equal Substring Found: HeyThere最长连续等于 Substring 找到:HeyThere

There are several possible design approaches.有几种可能的设计方法。 Here the major 3:这里是主要的 3:

  1. Brute force.蛮力。 This needs a triple nested loop and has a complexity of O(n^3).这需要一个三重嵌套循环,复杂度为 O(n^3)。 It may work for small strings, but will soon fail with strings getting bigger它可能适用于小字符串,但很快就会因字符串变大而失败
  2. Using dynamic programming.使用动态规划。 The complexity is only quadratic O(n^2).复杂度仅为二次 O(n^2)。 But it also needs a 2 dimension DP table for all characters of the source string.但它还需要一个二维 DP 表来存储源字符串的所有字符。
  3. Suffix trees, with a linear runtime.后缀树,具有线性运行时间。 But hard to implement and hard to understand.但难以实施且难以理解。 Please read here about suffix trees and here about Ukkonen's algorithm.请阅读此处有关后缀树的信息以及此处有关 Ukkonen 算法的信息。 But as said, this is some heavy stuff.但正如所说,这是一些沉重的东西。

The compromise proposal is number 2, dynamic programming.折衷方案是第 2 个,动态规划。 How does this work?这是如何运作的? Let's see.让我们来看看。 First we will iterate over all characters in an outer loop, and then, with a nested loop, all characters following the character under evaluation from the outer loop.首先,我们将遍历外循环中的所有字符,然后使用嵌套循环遍历外循环中正在评估的字符之后的所有字符。 So, if we have xabcyabcz, then we will start with the 'x' and compare it with 'a','b','c','y','a','b','c','z'.所以,如果我们有 xabcyabcz,那么我们将从 'x' 开始并将其与 'a'、'b'、'c'、'y'、'a'、'b'、'c'、'z 进行比较'. We will find no match.我们将找不到匹配项。

Next, we will advance one character given by the outer loop, in this example with 'a'.接下来,我们将推进一个由外层循环给出的字符,在这个例子中是'a'。 This we will compare with 'b','c','y','a','b','c','z'.我们将与 'b'、'c'、'y'、'a'、'b'、'c'、'z' 进行比较。 We will find one match and start to do activities.我们将找到一个匹配项并开始进行活动。

First, we check, if there is a potential overlap.首先,我们检查是否存在潜在的重叠。 This we see, if we compare the distance of the 2 indices of the found 'a's with the current maximum found substring length.如果我们将找到的 'a' 的 2 个索引的距离与当前找到的最大长度 substring 进行比较,我们就会看到这一点。 If the values are equal, then the overlaps starts.如果值相等,则重叠开始。 We can decide, if we want to allow overlaps or not.我们可以决定是否允许重叠。

Now, the counting of the maximum length of an equal substring. We found the 'a' and a matching 'a'.现在,计数的最大长度等于 substring。我们找到了 'a' 和匹配的 'a'。 We will increment the substring length with the length that we found for the previous character.我们将用我们为前一个字符找到的长度增加 substring 的长度。 That was 0. So, now, for this 'a', we store a length of 0+1=1.那是 0。所以,现在,对于这个“a”,我们存储的长度为 0+1=1。 Next, when we evaluate the 'b' given by the outer loop and find a match later in the string, then we increment the string length for 'b' with that from the previos character 'a', that was 1, and will now have 1+1=2 for 'b'.接下来,当我们评估外循环给出的 'b' 并稍后在字符串中找到匹配项时,我们将 'b' 的字符串长度增加到之前字符 'a' 的字符串长度,即 1,现在将对于'b'有1 + 1 = 2。

All this will be continued until all characters are checked.所有这一切将继续进行,直到检查完所有字符。

We will memorize the maximum found string length, and an index in the given string to check, for the maximum length substring.我们将记住找到的最大字符串长度,以及要检查的给定字符串中的索引,最大长度为 substring。

At the end, we simple build a substring of the input string, with the given parameters.最后,我们使用给定的参数简单地构建输入字符串的 substring。

All this (oand this is one patential solution) could be done like the below:所有这些(o并且这是一个专利解决方案)都可以像下面这样完成:

#include <iostream>
#include <string>
#include <vector>

std::string getLongestConsecutiveEqualSubstring(const std::string& stringToCheck) {

    // We need to use the strings length more often. So read the function here only once.
    const size_t testStringLength = stringToCheck.length();

    // Table to store length values for sub strings. 2d vector. Initialize with 0
    std::vector< std::vector<unsigned int>> commonSubStringLength(testStringLength, std::vector<unsigned int>(testStringLength, 0u));

    // This will store the maximum length of an equal substring
    unsigned int maxLength = 0u; 
    // And this the index in the string to check. Index points to the last character of the found substring
    size_t indexLastLetterOfMax = 0u;

    // Temporary to handle negative indices
    unsigned int dummyLengthNull = 0u;

    // Iterate over string to check. Each character will be compared with all following charcters
    for (size_t indexFirst = 0; indexFirst < testStringLength; ++indexFirst) {
        for (size_t indexSecond = indexFirst + 1; indexSecond < testStringLength; ++indexSecond) {

            // These are just abbreviations with a pointer for easier access within the 2d vector
            unsigned int* const currentLength = &commonSubStringLength[indexFirst][indexSecond];
            // Since we start at index 0, there would be a problem with the "previous" length element
            // There for we simply point to a 0-element in such case
            const unsigned int*  previousLength{};
            if (indexFirst > 0)
                previousLength = &commonSubStringLength[indexFirst - 1][indexSecond - 1];
            else
                previousLength = &dummyLengthNull;

            // This will prevent overlapping findings lile for ababababab
            // You may set this to false, ant the algorithm will work with overlaps
            const bool overlap = ((indexSecond - indexFirst) < *previousLength);

            // So, if we have 2 identical characters in the string
            if (stringToCheck[indexFirst] == stringToCheck[indexSecond] and not overlap) {

                // Then we increase the length for this substring, with that from the character before
                *currentLength = *previousLength + 1;

                // And if we found new max sub string length
                if (*currentLength > maxLength) {

                    // Then we memorize the new length
                    maxLength = *currentLength;
                    // and the end-index in the given text for this substring
                    if (indexFirst > indexLastLetterOfMax) indexLastLetterOfMax = indexFirst;
                }
            }
            else
                // Else, the letter were different. So, length of substring will be zero
                *currentLength = 0;
        }
    }
    // Build and return substring from given parameters
    return stringToCheck.substr(indexLastLetterOfMax - maxLength + 1, maxLength);
}
// Test / Driver code
int main()
{
    std::string testString{ "HeyThereHeyThere123JJJ123JJJ" };
    std::cout << "Result: " << getLongestConsecutiveEqualSubstring(testString) << '\n';
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM