使用 strstr 查找 substring 的所有实例导致奇怪的字符串格式

Question

I'm making a web scraper and i'm at the point where I need to parse the incoming data.我正在制作一个 web 刮板，我正处于需要解析传入数据的位置。 Everything was going fine until I had to find all instances of a substring in a string.一切都很顺利，直到我不得不在字符串中找到 substring 的所有实例。 I was able to get something working but it doesn't give me the full string I want (which is a full  tag).我能够得到一些工作，但它没有给我我想要的完整字符串（这是一个完整的标签）。

done = 0;

while (done == 0) {
    if ((findSpan = strstr(serverResp, "<p")) != NULL) {
        printf("%s\n", findSpan);
        if ((findSpanEnd = strstr(findSpan, "</p>")) != NULL) {
            strcpy(serverResp, findSpanEnd);
            strcpy(findSpanEnd+4, "");
            printf("after end tag formattng %s\n", findSpan);
        }
    } else {
        done = 1;
    }
}

After end tag formatting should give me a result along the lines of insert text here but instead, I get something like this:结束标记格式化后应该给我一个类似于insert text here的结果，但相反，我得到的是这样的：

        <p>This should be printed</p>
        <h3>ignore</h3>
        <p>and so should this</p>
    </body>
</html>

after end tag formatting <p>This should be printed</p>
        <h3>ignore</h3>
        <p>and so should this</p>
    </body>
</html>

after end tag formatting dy>
</html>

The site's code looks like this:该网站的代码如下所示：

<!DOCTYPE html>
<html>
    <head></head>
    <body>
        <h1>ignore this</h1>
        <p>This should be printed</p>
        <h3>ignore</h3>
        <p>and so should this</p>
    </body>
</html>

Answer 1

        if ((findSpanEnd = strstr(findSpan, "</p>")) != NULL) {
            strcpy(serverResp, findSpanEnd);

This makes no sense.这是没有意义的。 strstr finds "" as requested; strstr按要求查找"" ； however you can't pass that to strcpy like that.但是你不能像那样将它传递给strcpy 。 strstr doesn't allocate a new string at all; strstr根本不分配新字符串； it only returns the location within the old one.它只返回旧位置中的位置。

A routine to print out all  tags would look like this (note that this assumes no nested  tags):打印出所有标签的例程如下所示（请注意，这假定没有嵌套的标签）：

    for (char *ptr = serverResp; ptr = strstr(ptr, "<p");)
    {
        char *finger = strchr(ptr, '>');
        if (!finger) break;
        ++finger;
        ptr = strstr(finger, "</p>");
        if (!ptr) {
            fwrite(finger, 1, strlen(finger), stdout);
        } else {
            fwrite(finger, 1, ptr - finger, stdout);
        }
        fputs("\r\n", stdout);
    }

The technique: the call to strstr in the for loop locates the next  tag, strchr finds the end of it, then another strstr finds the closing  Because the return pointers are into the originating string, we use fwrite instead of printf to produce output.技术：for 循环中对strstr的调用定位下一个标记， strchr找到它的结尾，然后另一个strstr找到结束因为返回指针指向原始字符串，所以我们改用fwrite printf生成 output。

使用 strstr 查找 substring 的所有实例导致奇怪的字符串格式

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-02-26 01:26:24

使用 strstr 查找 substring 的所有实例导致奇怪的字符串格式

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-02-26 01:26:24

解决方案1
1 已采纳 2022-02-26 01:26:24