简体   繁体   中英

How can I remove certain characters from a string in C?

I have strings that have HTML tags in them (eg: "<p>sample_text</p>" ). I would like to remove these tags from the strings as seen in the pseudo-code below:

string(string input_string)
{
    int i = 0
    bool is_deleting = False
    
    while(i < length(input_string))
    {
         if(input_string[i] == "<")
         {
             is_deleting = True
         }
         
         if(is_deleting == True)
         {
             if(input_string[i] == ">")
             {
                 is_deleting = False
             }
             input_string[i] = ""
         }
         i += 1
     }
return input_string
}

How could I make this work?

You are thinking in the right direction, you have just confused the logic for deleting. In your case where you consider the tags to be is_deleting you only want to copy characters when not deleting.

Rather than considering if your condition is_deleting why not consider whether you are intag . At least when iterating over characters, being either in at tag ignoring characters or not in a tag copying characters seems a bit more descriptive.

Regardless you have 3 conditions for the current character. It is either (1) a '<' indicating a tag-opening where you set your intag flag true, or (2) the intag flag is true and the current character is '>' marking the close of the tag, or (3) intag is false and you are copying characters. You can implement that logic as follows:

When looping over the characters in any string, there is no need to take the strlen() . The nul-terminating character marks the end of the string for you.

If you put that together, you could do:

#include <stdio.h>

char *rmtags (char *s)
{
    int intag = 0,                      /* flag in-tag 0/1 (false/true) */
        write = 0;                      /* write index */
    
    for (int i = 0; s[i]; i++) {        /* loop over each char in s */
        if (s[i] == '<')                /* tag opening? */
            intag = 1;                  /* set intag flag true */
        else if (intag) {               /* if inside a tag */
            if (s[i] == '>')            /* tag close */
                intag = 0;              /* set intag false */
        }
        else                            /* not opening & not in tag */
            s[write++] = s[i];          /* copy to write index, increment */
    }
    s[write] = 0;                       /* nul-terminate s */
    
    return s;                           /* convenience return of s */
}

int main (void) {
    
    char s[] = "<p>sample_text</p>";
    
    printf ("text: '%s'\n", rmtags (s));
}

( note: You don't want to reinvent the wheel to parse html. See Parse html using C and particularly gumbo-parser. In this limited simple example -- it is trivial, but nested tags spanning multiple lines wildly complicate this endeavor quickly. Use a library that validates html)

Example Use/Output

$ ./bin/html_rmtags
text: 'sample_text'
char *removetags(char *str, char opentag, char closetag)
{
    char *write = str, *read = str;
    int remove = 0;

    while(*read)
    {
        if(*read == closetag && remove)
        {
            read++;
            remove = 0;
        }
        if(*read == opentag || remove)
        {
            read++;
            remove = 1;
        }
        else
        {
            *write++ = *read++;
        }
    } 
    *write = 0;
    return str;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM