我正在尝试制作字符串解析器，但出了点问题

Question

我试图制作一个文本解析器，它根据空格字符分隔字符串中的单词。 但是，出了点问题。

#include <stdio.h>
#include <string.h>

int main() {
    //the string should end with a space to count the all words
    char name[30] = "hello world from jordan ";
    int start = 0;
    int end = strlen(name);
    int end_word = start;
    char full[20][20];

    memset(full, 0, 400);

    int number_of_words = 0;

    for (int w = 0; w < end; w++) {
        if (name[w] == ' ') {
            number_of_words++;
        }
    }

    int counter = 0;

    while (counter < number_of_words) {
        for (int i = start; i < end; i++) {
            if (name[i] == ' ') {
                start = i;
                break;
            }
        }

        for (int j = end_word; j < start; j++) {
            full[counter][j] = name[j];
        }

        end_word = start;
        start++;
        counter++;
    }

    for (int x = 0; x < 20; x++) {
        for (int y = 0; y < 20; y++) {
            printf("%c", full[x][y]);
        }

        printf("%d", x);
    }

    return 0;
}

这是我运行代码时发生的奇怪事情：

 hello0 world1 from2 jor3dan45678910111213141516171819

前三个词正在以正确的方式初始化，但第四个不是，我不知道为什么会这样。

我想要对问题的解释，如果可能的话，我想要一种更有效的方式来编写此代码，而无需使用指针。

注意：我是初学者，这就是为什么我要求没有指针的解决方案。

Answer 1

首先，试图避免 C 中的指针将（非常）困难。 就其性质而言，arrays 在您想对它们做任何有用的事情时立即成为指针。 数组订阅是指针算法的语法糖（ foo[2]与*(foo + 2)相同）。 将数组传递给 function 将导致它衰减到指向第一个元素的指针。

无论您是否意识到，您都会在代码中多次使用指针。

至于代码...

快速说明： size_t ，而不是int ，是使用 memory 大小/索引时使用的适当类型。 我将在代码的“更正”版本中使用它，您应该尝试在一般情况下使用它，继续前进。

output 有点令人困惑，因为所有内容都打印在一行上。 让我们清理一下，并添加一些调试信息，例如您存储的每个字符串的长度。

for (size_t x = 0; x < 20; x++) {
    printf("%zu [length: %zu]: ", x, strlen(full[x]));

    for (size_t y = 0; y < 20; y++)
        printf("%c", full[x][y]);

    putchar('\n');
}

现在我们得到 output，跨越几行（为简洁起见，一些重复折叠），如下：

0 [length: 5]: hello
1 [length: 0]:  world
2 [length: 0]:  from
3 [length: 0]:  jor
4 [length: 3]: dan
5 [length: 0]: 
...
19 [length: 0]:

从这里我们可以看到一些值得注意的事情。

当我们只期待四个时，我们有一个额外的第五个“字符串”。
我们的第一个和第五个“字符串”具有明显正确的长度，而
我们的第二个到第四个“字符串”的表观长度为0 ，并且似乎包含空格。

零长度意味着我们的一些arrays以空终止字节（ '\0' ）开头，我们只看到 output 因为我们手动遍历每个数组的整体。

请注意，当要打印 null 字符时，大多数终端将“什么都不做”，这意味着我们似乎直接跳到了我们的“字符串”。 我们可以通过总是打印一些东西来更好地可视化正在发生的事情：

printf("%c", full[x][y] ? full[x][y] : '*');

在这种情况下，当我们遇到 null 字符时，我们会打印一个星号，从而得到 output：

0 [length: 5]: hello***************
1 [length: 0]: ***** world*********
2 [length: 0]: *********** from****
3 [length: 0]: **************** jor
4 [length: 3]: dan*****************
5 [length: 0]: ********************
...
19 [length: 0]: ********************

这非常清楚地显示了我们的角色在 memory 中的位置。

主要问题是在这个循环中

for (int j = end_word; j < start; j++) {
    full[counter][j] = name[j];
}

j被初始化为相对于name开头的 position ，但用于索引full的 memory 偏移量。 排除我们的第一个 substring，当end_word为0时，这使我们离每个子数组的第零个索引越来越远，最终跨越 arrays 之间的边界。

这恰好起作用，因为 C 中的 2D arrays 在 memory 中连续布局。

为了解决这个问题，我们必须使用一个单独的索引来复制我们的字符，每个子数组从零开始。

for (size_t j = end_word, k = 0; j < start; j++, k++) {
    full[counter][k] = name[j];
}

现在，当我们打印 arrays 时，我们可以将自己限制在已知的number_of_words （ for (size_t x = 0; x < number_of_words; x++) ），给我们 output：

0 [length: 5]: hello***************
1 [length: 6]:  world**************
2 [length: 5]:  from***************
3 [length: 7]:  jordan*************

这看起来大致正确，但在“单词”中包含了前面的空格。 我们可以通过将end_word设置为下一个字符来跳过这些空格：

start++;
end_word = start;
counter++;

现在我们的 output 看起来正确拆分：

0 [length: 5]: hello***************
1 [length: 5]: world***************
2 [length: 4]: from****************
3 [length: 6]: jordan**************

请注意，这些是（现在已正确格式化）以空字符结尾的字符串，并且可以使用%s说明符打印，如下所示：

for (size_t x = 0; x < number_of_words; x++)  
    printf("%zu [length: %zu]: %s\n", x, strlen(full[x]), full[x]);

总的来说，这有点脆弱，因为它需要尾随定界空间才能工作，并且每次重复定界空格时都会创建一个空字符串（或者如果源字符串以空格开头）。

顺便说一句，这个类似的示例应该展示一种用于标记字符串的直接方法，同时跳过所有分隔符，并包含一些重要的注释。

#include <stdio.h>
#include <string.h>

int main(void) {
    char name[30] = "hello world from jordan";
    char copies[20][30] = { 0 };
    size_t length_of_copies = 0;

    size_t hold_position = 0;
    size_t substring_span = 0;
    size_t i = 0;

    do {
        /* our substring delimiters */
        if (name[i] == ' ' || name[i] == '\0') {
            /* only copy non-zero spans of non-delimiters */
            if (substring_span) {
                /* `strncpy` will not insert a null terminating character
                 * into the destination if it is not found within the span
                 * of characters of the source string...
                 */
                strncpy(
                    copies[length_of_copies],
                    name + hold_position,
                    substring_span
                );

                /* ...so we must manually insert a null terminating character
                 * (or otherwise rely on our memory being initialized to all-zeroes)
                 * */
                copies[length_of_copies++][substring_span] = '\0';
                substring_span = 0;
            }

            /* let's assume our next position will be the start of a substring */
            hold_position = i + 1;
        } else
            substring_span++;

        /* checking our character at the end of the loop,
         * and incrementing after the fact,
         * let's us include the null terminating character as a delimiter,
         * as we will only fail to enter the loop after processing it
         */
    } while (name[i++] != '\0');

    for (size_t i = 0; i < length_of_copies; i++)
        printf("%zu: [%s]\n", i + 1, copies[i]);
}

我正在尝试制作字符串解析器，但出了点问题

问题描述

1 个解决方案

解决方案1
0 2022-01-09 23:47:45

我正在尝试制作字符串解析器，但出了点问题

问题描述

1 个解决方案

解决方案1 0 2022-01-09 23:47:45

解决方案1
0 2022-01-09 23:47:45