简体   繁体   中英

Substring recursive algorithm not working

I'm a programming student in my first C++ class, and recently we were encouraged to write a simple recursive function to find the first occurrence of a substring in a given string. If found, it returns the index. If the substring is not found, the index_of() function should return -1. We are encouraged to use a helper function that takes the index as one of its parameters, and this is what I've tried.

For example:

int index_of("Mississippi", "sip"); // this would return a 6

This is supposed to be a simple exercise to help us understand recursion and won't be turned in. My professor stated that our actual assignment with recursion will be much more involved, which is why I really want to understand this simple use of recursion.

I've done this successfully using C-style strings and pointers, but not with C++ std::string objects. What am I doing wrong in my program? My professor stated we should easily be able to write this in 5 mins, but I've been struggling with it for two hours. Here's what I've done so far:

int index_of(string s, string t)
{
    int index = 0;

    if (s[index] == NULL)
        return -1;
    else if (starts_with(s, t, ++index))
    {
        return index;
    }
    else 
        return index;
}

bool starts_with(string s, string t, int index)
{
    if (t[index] == NULL)
        return true;
    if ( s[index] == NULL || t[0] != s[index])
        return false;
    return starts_with(s, t, ++index);
}

As written, this function always returns an index of 1.

int index_of(string s, string t)
{
    int index = 0;

    if (s[index] == NULL)

Full stop. This isn't how C++'s strings work and you must fix this if you want to use them. Even with C-style strings, don't use NULL to mean the ASCII null character. They share a name but have different purposes, and you should not use NULL to mean integer zero (chars are integer types and the null character is their zero value). Use '\\0' or just if (s[index]) .

However, you aren't allowed to index a std::string unless you know the index is valid. To do that, compare the index against s.size() (and make sure it's greater than or equal to 0). Even so, what you are really testing here is if s is empty, and it has a special method to do that:

    if (s.empty())

Continuing:

    else if (starts_with(s, t, ++index))

Increment and decrement inside expressions , especially as here, can be confusing for the beginner with no advantage. The main advantage of them is code that is succinct and clear, but you have to already understand the main part of the code first, and even then experienced programmers sometimes benefit from being a tiny bit more verbose.

Anecdotally, Go's creators, who were also involved in early C history, even turned increment from an expression into a statement , and I believe clarity is a large part of the reason.


From the beginning

You want to implement a function with this signature:

int index_of(string haystack, string needle);
// returns the index of needle in haystack, if found
// otherwise returns -1

I include those comments with the signature on purpose: they are part of the public interface for this function. Better parameter names also increase clarity.

Identify the cases you need to consider:

  • needle is empty (you can handle this in multiple ways)
  • haystack is empty: return -1
  • at this point we know both haystack and needle are not empty
  • that leaves the two cases that are the crux of the algorithm:
    • first character of haystack does not match the first character of needle
    • there is a match of the first character

And when there is a match of the first character, you have two sub-cases:

  • there are no more characters in needle: match found
  • there are more characters: continue checking

I've written these as a recursive algorithm which receives "new copies" of each string (and substring) instead of using indices. However, you can transform to use indices by changing "first character" to "current character", and similarly for the "empty" conditions. You will want to use two indices in that case (and trying to only use one may have been a major stumbling block for you so far), unless you have a helping function to compare substrings (though I'm unsure if your professor had a separate intention with this comment).

A direct translation of the above prose into code:

int index_of(string haystack, string needle) {
  if (needle.empty()) return 0;
  // this implementation considers empty substrings to occur at the start of any
  // string, even an empty haystack; you could also make it an error to call
  // index_of when needle is empty, or just return -1

  if (haystack.empty()) return -1;

  assert(!needle.empty() && !haystack.empty()); // I wouldn't normally include
  // this, since we just checked these conditions, but this is the "at this
  // point we know both haystack and needle are not empty" that I mentioned

  if (haystack[0] != needle[0]) {
    // mark A, see below
    int index = index_of(haystack.substr(1), needle);
    return index != -1 ? index + 1 : index;
  }

  if (needle.length() == 1) return 0; // found complete match
  // note the way I chose to handle needle.empty() above makes this unnecessary

  // mark B, see below    
  // partial match (of the first character), continue matching
  int index = index_of(haystack.substr(1), needle.substr(1)); // strip first
  return index == 0 ? 0 : -1;
  // must check index == 0 exactly, if -1 then we must return that, and if not 0
  // then we've found a "broken" needle, which isn't a real match
}

The broken needle comment hints at how inefficient that code is, as it bifurcates the recursive calls into two categories: must match at 1 (which is 0 after slicing into substrings), at mark B, and can match anywhere, at mark A. We can improve this with a helper function, and I'll use std::string's operator== overload (operating on a substring of haystack) for that. This yields the recursive equivalent of the classical "naive strstr ":

int index_of(string haystack, string needle) {
  if (needle.empty()) return 0;
  if (haystack.empty()) return -1;
  if (haystack.substr(0, needle.length()) == needle()) {
    return 0;
  }
  int index = index_of(haystack.substr(1), needle);
  if (index != -1) index++;
  return index;
}

And when using an index for haystack with string::compare as the helper so a needle index isn't required:

// might not be exposed publicly, but could be
int index_of(string const& haystack, int haystack_pos, string const& needle) {
  // would normally use string const& for all the string parameters in this
  // answer, but I've mostly stuck to the prototype you already have

  // shorter local name, keep parameter name the same for interface clarity
  int& h = haystack_pos;

  // preconditions:
  assert(0 <= h && h <= haystack.length());

  if (needle.empty()) return h;
  if (h == haystack.length()) return -1;
  if (haystack.compare(h, needle.length(), needle) == 0) {
    return h;
  }
  return index_of(haystack, h+1, needle);
}

int index_of(string haystack, string needle) {
  // sets up initial values or the "context" for the common case
  return index_of(haystack, 0, needle);
}

Notice this version is tail-recursive, but this is still a naive algorithm and more advanced ones exist .


If I had more time, I would have written a shorter letter.
— Cicero

You've said this is helped you a lot, but, even with the additional examples I just included, it seems lacking to me. Substring-search is not a good recursion exercise, in my opinion, and that could be why.

Your code boils down to this

int index_of(string s, string t)
{
    int index = 0;

    //if (s[index] == NULL)
    //    return -1;
    ++index // from this: else if (starts_with(s, t, ++index))
    //{
    //    return index;
   // }
   //else 
        return index;
}

so, yes, it always returns 1

index continues to increment inside the recursive starts_with function, but the changed values never make it back out to be your return value.

You are not updating the index variable in the index_of() function. When you pass it to starts_with() it makes a copy on the stack. You need to return the updated value somehow - ether take it by reference/pointer or return it instead of bool .

I don't think 5 minutes was at all a reasonable estimate for an intro class student. To help you get even help you understand the issue, I've written what I think is the final answer...

#include <string>
#include <iostream>
using namespace std;

int starts_with(string s, string sub, unsigned int i) {
  if (i >= s.length())
    return -1;
  if (s.compare(i, sub.length(), sub) == 0)
    return i;
  return starts_with(s, sub, i + 1);
}
int index_of(string s, string sub) { return starts_with(s, sub, 0U); }
int main(void) { cout << index_of("Mississippi", "sip") << "\n"; }

(note - edited for simplicity)

Not sure if this'll help as I'm not sure to what extent recursion was explained to you by your teacher, but here goes...

This is how you need to think of it - A recursive function contains two main components:

1) a Base Case and

2) a Recursive Case

The key is that on each call, you are determining which of the two cases is true for this run of the function (based on the input parameter(s)).

The base case is where the function is not called again, and the recursive case always calls the function, and is usually more complicated in its logic than the base case.


So I'd suggest starting over. Think about what the input should be such that on a function call with that input the function does not call itself ---> that's your base case .

If on a function call it isn't the base case, it's the recursive case. So your function needs to look like this:

function index_of(params...){
    if base case
        do something simple and return
    else
        do something and call index_of again
}

Hint: In your function, there is no recursive case.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM