简体   繁体   中英

Removing a substring from a string in C

I already have the code that removes a substring from a string (word) in C, but I don't understand it. Can someone explain it to me? It doesn't use functions from the standard library. I tried to analyze it myself, but certain parts I still don't understand - I put them in the comments. I just need to figure out how does this all work.

Thanks!

#include <stdio.h>
#include <stdlib.h>
void remove(char *s1, char *s2);

int main()
{
   char s1[101], s2[101];
   printf("First word: ");
   scanf("%s", s1);
   printf("Second word: ");
   scanf("%s", s2);
   remove(s1, s2);
   printf("The first word after removing is '%s'.", s1);

   return 0;
}
void remove(char *s1, char *s2)
{
   int i = 0, j, k;
   while (s1[i])       // ITERATES THROUGH THE FIRST STRING s1?
   {
       for (j = 0; s2[j] && s2[j] == s1[i + j]; j++);   // WHAT DOES THIS LINE DO?
          if (!s2[j])           // IF WE'RE AT THE END OF STRING s2? 
             {
                 for (k = i; s1[k + j]; k++)   //WHAT DOES THIS ENTIRE BLOCK DO?
                    s1[k] = s1[k + j];
                    s1[k] = 0;
              }
          else
              i++;    // ???
    }
}

Here main working of function is like :

-Skip the common part between both strings and assign the first string with new string.

while (s1[i])       // Yes It ITERATES THROUGH THE FIRST STRING s1
       {
           for (j = 0; s2[j] && s2[j] == s1[i + j]; j++);   // Here it skips the part which is 
//similar in both

As this loop just increasing the index of common part so this will skip storing of data in s1.

if (!s2[j])           // IF WE'RE AT THE END OF STRING s2
{
 for (k = i; s1[k + j]; k++)   //Here it is re assigning the non common part.
 s1[k] = s1[k + j];
 s1[k] = 0;
}
else
 i++;    // it is req. if both have more values.
}

The first while (s1[i]) iterates through s1. Yes, you are right.

for (j = 0; s2[j] && s2[j] == s1[i + j]; j++); 

The above for loop checks whether the substring s2 is present in s1 starting from s1[i]. If it matches, s2 is completely iterated. If not, at the end of the for loop, s2[j] will not be null character. Example: if s1 = ITERATE and s2 = RAT, then the loop will execute completely only when i=3.
so the if (!s2[j]) holds then it means we have found a substring and i is the starting point of the substring in s1.

         for (k = i; s1[k + j]; k++)   //WHAT DOES THIS ENTIRE BLOCK DO?
            s1[k] = s1[k + j];
            s1[k] = 0;

The abov block removes the substring. So, for the ITERATE and RAT example, this is done by copying E and null char at positions where R and A were present. The for loop achieves this. If s2[j] is not null after for loop, the i is incremented to check for substribng from the next position of s1.

Here is an approach of the functionality condensed in the comments

void remove(char *s1, char *s2)
{
   int i = 0, j, k;
   while (s1[i])       // Iterates through s1 (until it finds a zero)
   {
       for (j = 0; s2[j] && s2[j] == s1[i + j]; j++);   // Iterates through s2 while both it is NOT the end of the string s2 and each character of s2 coincides with s1 (if s2 == s1, j points to the end of s2 => zero)
          if (!s2[j])           // If j point to the end of s2 => We've found the coincidence
             {
                 for (k = i; s1[k + j]; k++)   //Remove the coincident substring
                    s1[k] = s1[k + j];
                    s1[k] = 0;
              }
          else
              i++;    // There is no coincidence so we continue to the next character of s1
    }
}

Note: I also hace noticed that this may be easily exploted since it iterates out of s1 range.

Let's break it down. You have

while (s1[i])
{
    // Code
}

This iterates through s1 . Once you get to the end of the string, you have \\0 , which is the null terminator. When evaluated in a condition, it will evaluate to 0 . It may have been better to use a for here.

You then have

for (j = 0; s2[j] && s2[j] == s1[i + j]; j++);

This does nothing but increment j . It should be noted that this expression does not have braces and it terminated with a semicolon, so the code after it shouldn't be executed within the loop body. If it did have the braces correctly, it would loop over the following if/else while s2 was not null and s2[j] == s1[i+j] . I don't really have an explanation for the second part other than the character in s2 is offset by an amount i in s1 . This part could likely be improved to remove unnecessary iterations.

Then there's

if (!s2[j])
{
}
else
{
}

This checks to make sure the position in s2 is valid and executes the removal of the string if so and otherwise increments i . It could be improved by returning in the else when s2 could no longer fit in the remainder of s1 .

for (k = i; s1[k + j]; k++)
    s1[k] = s1[k + j];
    s1[k] = 0;

This is another somewhat strange loop since due to the absence of braces, s1[k] = 0 will be set outside of the loop. What happens here is that the string is compacted down by removing s2 and shifting the character at k+j down to k . At the end of the loop s1[k] = 0 ends the string in a null terminator to be properly ended.

If you want a deeper understanding, it may be worth trying to write your own code to do the same thing and then comparing afterwards. I have found that that generally helps more than reading a bunch of tests.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM