為什么 malloc() 分配的字節數比它應該分配的多 2 個字節？

Question

我正在編寫一個 c 編譯器。 Flex 識別出我的字符串令牌並將其發送到 function 以將其存儲在包含有關它的信息的 struct{} 中，但首先該字符串需要刪除轉義字符，即“”。 這是我的代碼：

char* removeEscapeChars(char* svalue)
{
    char* processedString; //will be the string with escape characters removed
    int svalLen = strlen(svalue);
    printf("svalLen (size of string passed in): %d\n", svalLen);
    printf("svalue (string passed in): %s\n", svalue);
    int foundEscapedChars = 0;
    for (int i = 0; i < svalLen;) 
    {
        if (svalue[i] == '\\') {
            //Found escaped character
            if (svalue[i+1] == 'n') {
                //Found newline character
                svalue[i] = int('\n');
            }
            else if (svalue[i+1] == '0') {
                //Found null character
                svalue[i] = int('\0');
            }
            else {
                //Any other character
                svalue[i] = svalue[i+1];
            }
            i++;
            foundEscapedChars++;
            for (int j = i; j < svalLen + 1; j++) {
                svalue[j] = svalue[j+1];
            }
        }
        else {
            i++;
        }
    }
    int newSize = svalLen - foundEscapedChars;
    processedString = (char*) malloc(newSize * sizeof(char));
    memcpy(processedString, svalue, newSize * sizeof(char));
    printf("newSize: %d\n", newSize);
    printf("processedString: %s\n", processedString);
    printf("processedString Size: %d\n", strlen(processedString));
    
    free(svalue);
    return processedString;
}

它在 99% 的時間都有效，但是當它在這個特定字符串（或具有 40 個字符的類似字符串）“-//W3C//DTD XHTML 1.0 Transitional//EN”上進行測試時，malloc() 似乎正在分配 memory字符串 2 個字節太大。 output 如下。 請注意，我在調用 malloc() 時使用了 int newSize，它說它的值是 40，然后 strlen() 返回 42。 sizeof(char) 也是 == 1。 主要問題是它在字符串末尾插入垃圾字符。 是什么賦予了？

"-//W3C//DTD XHTML 1.0 Transitional//EN"
svalLen (size of string passed in): 40
svalue (string passed in) "-//W3C//DTD XHTML 1.0 Transitional//EN"
newSize: 40
processedString: "-//W3C//DTD XHTML 1.0 Transitional//EN"Z
processedString Size: 42
Line 47 Token: STRINGCONST Value: "-//W3C//DTD XHTML 1.0 Transitional//EN"Z Len: 40 Input: "-//W3C//DTD XHTML 1.0 Transitional//EN"

Answer 1

代碼至少有這個問題：試圖打印一個不是字符串的“字符串”，因為它缺少終止null 字符和存儲它的空間。

這會導致未定義的行為。 此 UB 可能會顯示為打印額外字符。

// processedString = (char*) malloc(newSize * sizeof(char));
// memcpy(processedString, svalue, newSize * sizeof(char));
processedString = malloc(newSize + 1);
memcpy(processedString, svalue, newSize);
processedString[new_Size] = 0;

可能還有其他問題。

Answer 2

這是您的代碼的改版，它采用了一種不同的、更傳統的方法來處理字符串。 首先從計算轉義字符的 function 開始，因為這將在下一步中很有用：

int escapeCount(char* str) {
    int c = 0;

    // Can just increment and work through the string using the given pointer
    while (*str) {
        // Backslash something here
        if (*str == '\\') {
            ++str;
            ++c;
        }

        if (*str) {
          // Handle unmatched \ at end of string
          ++str;
        }
    }

    return c;
}

現在使用該信息，您可以分配正確的緩沖區大小：

char* removeEscapeChars(char* str)
{
    // IMPORTANT: Allocate strlen() + 1 for the NUL byte not counted
    char* result = malloc(strlen(str) - escapeCount(str) + 1);
    char* r = result;

    do {
        if (*str == '\\') {
            ++str;

            switch (*str) {
                case 'n':
                    *r = '\n';
                    break;
                case 'r':
                    *r = '\r';
                    break;
                case 't':
                    *r = '\t';
                    break;
                default:
                    *r = *str;
                    break;
            }
        }
        else {
            *r = *str;
        }

        if (*str) {
          ++str;
        }

        ++r;
    } while(*str);

    return result;
}

為什么 malloc() 分配的字節數比它應該分配的多 2 個字節？

問題描述

2 個解決方案

解決方案1
3 2021-01-28 00:23:07

解決方案2
1 已采納 2021-01-28 00:23:22

為什么 malloc() 分配的字節數比它應該分配的多 2 個字節？

問題描述

2 個解決方案

解決方案1 3 2021-01-28 00:23:07

解決方案2 1 已采納 2021-01-28 00:23:22

解決方案1
3 2021-01-28 00:23:07

解決方案2
1 已采納 2021-01-28 00:23:22