[英]How can I remove certain characters from a string in C?
I have strings that have HTML tags in them (eg: "<p>sample_text</p>"
).我有包含 HTML 标签的字符串(例如:
"<p>sample_text</p>"
)。 I would like to remove these tags from the strings as seen in the pseudo-code below:我想从字符串中删除这些标签,如下面的伪代码所示:
string(string input_string)
{
int i = 0
bool is_deleting = False
while(i < length(input_string))
{
if(input_string[i] == "<")
{
is_deleting = True
}
if(is_deleting == True)
{
if(input_string[i] == ">")
{
is_deleting = False
}
input_string[i] = ""
}
i += 1
}
return input_string
}
How could I make this work?我怎么能让这个工作?
You are thinking in the right direction, you have just confused the logic for deleting.您正在朝着正确的方向思考,您只是混淆了删除的逻辑。 In your case where you consider the tags to be
is_deleting
you only want to copy characters when not deleting.在您认为标签是
is_deleting
的情况下,您只想在不删除时复制字符。
Rather than considering if your condition is_deleting
why not consider whether you are intag
.而不是考虑如果你的病情
is_deleting
为什么不考虑你是否intag
。 At least when iterating over characters, being either in at tag ignoring characters or not in a tag copying characters seems a bit more descriptive.至少在迭代字符时,在 at 标记中忽略字符或不在标记复制字符中似乎更具描述性。
Regardless you have 3 conditions for the current character.不管你对当前角色有 3 个条件。 It is either (1) a
'<'
indicating a tag-opening where you set your intag
flag true, or (2) the intag
flag is true and the current character is '>'
marking the close of the tag, or (3) intag
is false and you are copying characters.它是 (1) 一个
'<'
表示您将intag
标志设置为 true 的标签打开,或 (2) intag
标志为 true 并且当前字符是'>'
标记标签的关闭,或 (3 ) intag
为假,您正在复制字符。 You can implement that logic as follows:您可以按如下方式实现该逻辑:
When looping over the characters in any string, there is no need to take the strlen()
.循环遍历任何字符串中的字符时,无需使用
strlen()
。 The nul-terminating character marks the end of the string for you.空终止字符为您标记字符串的结尾。
If you put that together, you could do:如果你把它放在一起,你可以这样做:
#include <stdio.h>
char *rmtags (char *s)
{
int intag = 0, /* flag in-tag 0/1 (false/true) */
write = 0; /* write index */
for (int i = 0; s[i]; i++) { /* loop over each char in s */
if (s[i] == '<') /* tag opening? */
intag = 1; /* set intag flag true */
else if (intag) { /* if inside a tag */
if (s[i] == '>') /* tag close */
intag = 0; /* set intag false */
}
else /* not opening & not in tag */
s[write++] = s[i]; /* copy to write index, increment */
}
s[write] = 0; /* nul-terminate s */
return s; /* convenience return of s */
}
int main (void) {
char s[] = "<p>sample_text</p>";
printf ("text: '%s'\n", rmtags (s));
}
( note: You don't want to reinvent the wheel to parse html. See Parse html using C and particularly gumbo-parser. In this limited simple example -- it is trivial, but nested tags spanning multiple lines wildly complicate this endeavor quickly. Use a library that validates html) (注意:您不想重新发明轮子来解析 html。请参阅Parse html using C ,尤其是 gumbo-parser。在这个有限的简单示例中 - 它是微不足道的,但是跨越多行的嵌套标签使这项工作迅速复杂化。使用验证 html 的库)
Example Use/Output示例使用/输出
$ ./bin/html_rmtags
text: 'sample_text'
char *removetags(char *str, char opentag, char closetag)
{
char *write = str, *read = str;
int remove = 0;
while(*read)
{
if(*read == closetag && remove)
{
read++;
remove = 0;
}
if(*read == opentag || remove)
{
read++;
remove = 1;
}
else
{
*write++ = *read++;
}
}
*write = 0;
return str;
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.