为什么 malloc() 分配的字节数比它应该分配的多 2 个字节？

Question

我正在编写一个 c 编译器。 Flex 识别出我的字符串令牌并将其发送到 function 以将其存储在包含有关它的信息的 struct{} 中，但首先该字符串需要删除转义字符，即“”。 这是我的代码：

char* removeEscapeChars(char* svalue)
{
    char* processedString; //will be the string with escape characters removed
    int svalLen = strlen(svalue);
    printf("svalLen (size of string passed in): %d\n", svalLen);
    printf("svalue (string passed in): %s\n", svalue);
    int foundEscapedChars = 0;
    for (int i = 0; i < svalLen;) 
    {
        if (svalue[i] == '\\') {
            //Found escaped character
            if (svalue[i+1] == 'n') {
                //Found newline character
                svalue[i] = int('\n');
            }
            else if (svalue[i+1] == '0') {
                //Found null character
                svalue[i] = int('\0');
            }
            else {
                //Any other character
                svalue[i] = svalue[i+1];
            }
            i++;
            foundEscapedChars++;
            for (int j = i; j < svalLen + 1; j++) {
                svalue[j] = svalue[j+1];
            }
        }
        else {
            i++;
        }
    }
    int newSize = svalLen - foundEscapedChars;
    processedString = (char*) malloc(newSize * sizeof(char));
    memcpy(processedString, svalue, newSize * sizeof(char));
    printf("newSize: %d\n", newSize);
    printf("processedString: %s\n", processedString);
    printf("processedString Size: %d\n", strlen(processedString));
    
    free(svalue);
    return processedString;
}

它在 99% 的时间都有效，但是当它在这个特定字符串（或具有 40 个字符的类似字符串）“-//W3C//DTD XHTML 1.0 Transitional//EN”上进行测试时，malloc() 似乎正在分配 memory字符串 2 个字节太大。 output 如下。 请注意，我在调用 malloc() 时使用了 int newSize，它说它的值是 40，然后 strlen() 返回 42。 sizeof(char) 也是 == 1。 主要问题是它在字符串末尾插入垃圾字符。 是什么赋予了？

"-//W3C//DTD XHTML 1.0 Transitional//EN"
svalLen (size of string passed in): 40
svalue (string passed in) "-//W3C//DTD XHTML 1.0 Transitional//EN"
newSize: 40
processedString: "-//W3C//DTD XHTML 1.0 Transitional//EN"Z
processedString Size: 42
Line 47 Token: STRINGCONST Value: "-//W3C//DTD XHTML 1.0 Transitional//EN"Z Len: 40 Input: "-//W3C//DTD XHTML 1.0 Transitional//EN"

Answer 1

代码至少有这个问题：试图打印一个不是字符串的“字符串”，因为它缺少终止null 字符和存储它的空间。

这会导致未定义的行为。 此 UB 可能会显示为打印额外字符。

// processedString = (char*) malloc(newSize * sizeof(char));
// memcpy(processedString, svalue, newSize * sizeof(char));
processedString = malloc(newSize + 1);
memcpy(processedString, svalue, newSize);
processedString[new_Size] = 0;

可能还有其他问题。

Answer 2

这是您的代码的改版，它采用了一种不同的、更传统的方法来处理字符串。 首先从计算转义字符的 function 开始，因为这将在下一步中很有用：

int escapeCount(char* str) {
    int c = 0;

    // Can just increment and work through the string using the given pointer
    while (*str) {
        // Backslash something here
        if (*str == '\\') {
            ++str;
            ++c;
        }

        if (*str) {
          // Handle unmatched \ at end of string
          ++str;
        }
    }

    return c;
}

现在使用该信息，您可以分配正确的缓冲区大小：

char* removeEscapeChars(char* str)
{
    // IMPORTANT: Allocate strlen() + 1 for the NUL byte not counted
    char* result = malloc(strlen(str) - escapeCount(str) + 1);
    char* r = result;

    do {
        if (*str == '\\') {
            ++str;

            switch (*str) {
                case 'n':
                    *r = '\n';
                    break;
                case 'r':
                    *r = '\r';
                    break;
                case 't':
                    *r = '\t';
                    break;
                default:
                    *r = *str;
                    break;
            }
        }
        else {
            *r = *str;
        }

        if (*str) {
          ++str;
        }

        ++r;
    } while(*str);

    return result;
}

为什么 malloc() 分配的字节数比它应该分配的多 2 个字节？

问题描述

2 个解决方案

解决方案1
3 2021-01-28 00:23:07

解决方案2
1 已采纳 2021-01-28 00:23:22

为什么 malloc() 分配的字节数比它应该分配的多 2 个字节？

问题描述

2 个解决方案

解决方案1 3 2021-01-28 00:23:07

解决方案2 1 已采纳 2021-01-28 00:23:22

解决方案1
3 2021-01-28 00:23:07

解决方案2
1 已采纳 2021-01-28 00:23:22