简体   繁体   English

未知的C字符串截断/覆盖

[英]Unknown C String Truncation/Overwrite

I am having an interesting memory problem with a simple string manipulation. 我有一个简单的字符串操作有趣的内存问题。 The problem itself isn't actually in the reading of the string but right before it when I am trying to call the string. 问题本身实际上并不是在读取字符串时,而是在我尝试调用字符串之前。

char *removeInvalid(char *token){
    fprintf(stderr," Before: %s \n", token);
    char *newToken = malloc(sizeof(100) + 1);
    fprintf(stderr," After: %s \n", token);
}

Whenever I run this, the string if truncated right after the char *newToken is malloc'd. 每当我运行它时,字符串如果在char * newToken之后被截断,则是malloc'd。 So the printout of this results in 因此打印输出结果

Before: Willy Wanka's Chochlate Factory
After: Will Wanka's Chochlate F!

Anyone have any clue what this is? 任何人都知道这是什么? I looked at other examples of malloc, but can't figure out how it is going wrong here. 我查看了malloc的其他示例,但无法弄清楚它是如何出错的。

EDIT: FULL CODE BELOW. 编辑:以下完整代码。 Take note I am a college student who just began C, so it isn't perfect by anymeans. 请注意我是一名刚开始学习C的大学生,所以任何人都不是完美的。 But it works up until this error. 但它可以解决这个错误。

Function calls goes as follows. 函数调用如下。 Main->initialReadAVL (This part works perfectly) Then after commandReadAVL is called which goes commandReadAVL->ReadHelper (Again works fine here. Then CleanUpString->removeSpaces(works fine) Then CleanUpString->removeInvalid(THIS IS WHERE IT ERRORS) Main-> initialReadAVL(这部分工作正常)然后在调用commandReadAVL之后执行commandReadAVL-> ReadHelper(这里再次正常工作。然后CleanUpString-> removeSpaces(正常)然后CleanUpString-> removeInvalid(这是错误的地方)

#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <string.h>
#include <ctype.h>
#include "node.h"
#include "avl.h"
#include "scanner.h"
#include "bst.h"

/* Options */
int avlSwitch = 0;
int bstSwitch = 0;
int insertSwitch = 0;
int deleteSwitch = 0;
int frequencySwitch = 0;
int displaySwitch = 0;
int statisticSwitch = 0;

int ProcessOptions(int argc, char **argv);
char *cleanUpString(char *token);
char *turnToLowerCase(char *token);
char *removeSpaces(char *token);
char *removeInvalid(char *token);
char *readHelper(FILE *in);
void Fatal(char *fmt, ...);
void preOrder(struct node *root);
void initialReadAVL(avl *mainAVL, FILE *in);
void initialReadBST(bst *mainBST, FILE *in);
void commandReadBST(bst *mainBST, FILE *commandList);
void commandReadAVL(avl *mainAVL, FILE *commandList);

int main(int argc, char **argv) {
    struct avl *mainAVL;
    struct bst *mainBST;
    FILE *text;
    FILE *commandList;


    if(argc != 4){
        Fatal("There must be 4 arguments of form 'trees -b corpus commands' \n");
    }

    int argIndex = ProcessOptions(argc,argv);

    text = fopen(argv[2], "r");
    commandList = fopen(argv[3], "r");

    //Protect against an empty file.
    if (text == NULL){
        fprintf(stderr,"file %s could not be opened for reading\n", argv[2]);
        exit(1);
    }

    if (commandList == NULL){
        fprintf(stderr,"file %s could not be opened for reading\n", argv[3]);
        exit(1);
    }


    if (avlSwitch){
        mainAVL = newAVL();
        initialReadAVL(mainAVL, text);
        preOrder(mainAVL->root);
        fprintf(stderr,"\n");
        commandReadAVL(mainAVL, commandList);
        preOrder(mainAVL->root);
        fprintf(stderr,"\n");
    }
    else if (bstSwitch){
        mainBST = newBST();
        initialReadBST(mainBST, text);
        preOrder(mainBST->root);
        commandReadBST(mainBST, commandList);
        preOrder(mainBST->root);
    }


    return 0;
}


void commandReadAVL(avl *mainAVL, FILE *commandList){
    char *command;
    char *textSnip;
    while(!feof(commandList)){
        command = readHelper(commandList);
        textSnip = readHelper(commandList);
        textSnip = cleanUpString(textSnip);

        if(command != NULL){
            switch (command[0]) {
            case 'i':
                fprintf(stderr,"%s \n", textSnip);
                insertAVL(mainAVL, textSnip);
                break;
            case 'd':
                deleteAVL(mainAVL, textSnip);
                break;
            case 'f':
                break;
            case 's':
                break;
            case 'r':
                break;
            default:
                Fatal("option %s not understood\n",command);
            } 
        }

    }
}

void commandReadBST(bst *mainBST, FILE *commandList){
    char *command;
    char *textSnip;
    while(!feof(commandList)){
        command = readHelper(commandList);
        textSnip = readHelper(commandList);
        textSnip = cleanUpString(textSnip);
        if(command != NULL){
            switch (command[0]) {
                case 'i':
                    insertBST(mainBST, textSnip);
                    break;
                case 'd':
                    deleteBST(mainBST, textSnip);
                    break;
                case 'f':
                    break;
                case 's':
                    break;
                case 'r':
                    break;
                default:
                    Fatal("option %s not understood\n",command);
                } 
        }
    }
}


char *readHelper(FILE *in){
    char *token;
    if (stringPending(in)){
        token = readString(in);
    }
    else {
        token = readToken(in);
    }
    return token;
}

void initialReadBST(bst *mainBST, FILE *in){
    char *token;
    while(!feof(in)){

        token = readHelper(in);
        token = cleanUpString(token);
        if (token != NULL){
            insertBST(mainBST, token);
        }
    }
}

void initialReadAVL(avl *mainAVL, FILE *in){
    char *token;
    while(!feof(in)){

        token = readHelper(in);
        token = cleanUpString(token);
        if (token != NULL){
            insertAVL(mainAVL, token);
        }
    }
}

//Helper Function to clean up a string using all the prerequisites. 
char *cleanUpString(char *token){
    char *output = malloc(sizeof(*token)+ 1);
    if (token != NULL){
        output = removeSpaces(token);
         fprintf(stderr,"before : %s \n", output);
        output = removeInvalid(output);
         fprintf(stderr,"%s \n", output);
        output = turnToLowerCase(output);
        return output;
    }
    return NULL;

}

//Helper function to turn the given string into lower case letters
char *turnToLowerCase(char *token){
    char *output = malloc(sizeof(*token) + 1);
    for (int x = 0; x < strlen(token); x++){
            output[x] = tolower(token[x]);
        }
    return output;
}

//Helper function to remove redundent spaces in a string.
char *removeSpaces(char *token){
    char *output;
    int x = 0;
    int y = 0;

    while (x < strlen(token)){
        if (token[x]== ' ' && x < strlen(token)){
            while(token[x] == ' '){
                x++;
            }
            output[y] = ' ';
            y++;
            output[y] = token[x];
            y++;
            x++;
        }
        else {
            output[y] = token[x];
            y++;
            x++;
        }

    }
    return output;

}

char *removeInvalid(char *token){
    fprintf(stderr," Before: %s \n", token);
    char *newToken = malloc(sizeof(* token)+ 1);
    fprintf(stderr," After: %s \n", token);


    int x = 0;
    int y = 0;
    while (x < strlen(token)){
        if (!isalpha(token[x]) && token[x] != ' '){
            x++;
        }
        else {
            newToken[y] = token[x];
            y++;
            x++;
        }
    }
    return newToken;
}


//Processes a system ending error. 
void Fatal(char *fmt, ...) {
    va_list ap;

    fprintf(stderr,"An error occured: ");
    va_start(ap, fmt);
    vfprintf(stderr, fmt, ap);
    va_end(ap);

    exit(-1);
    }


//Processes the options needed to be executed from the command line
int ProcessOptions(int argc, char **argv) {
    int argIndex;
    int argUsed;
    int separateArg;

    argIndex = 1;

    while (argIndex < argc && *argv[argIndex] == '-')
        {
        /* check if stdin, represented by "-" is an argument */
        /* if so, the end of options has been reached */
        if (argv[argIndex][1] == '\0') return argIndex;

        separateArg = 0;
        argUsed = 0;

        if (argv[argIndex][2] == '\0')
            {
            separateArg = 1;
            }

        switch (argv[argIndex][1])
            {
            case 'b':
                bstSwitch = 1;
                break;
            case 'a':
                avlSwitch = 1;
                break;
            default:
                Fatal("option %s not understood\n",argv[argIndex]);
            }

        if (separateArg && argUsed)
            ++argIndex;

        ++argIndex;
        }

    return argIndex;
}


void preOrder(struct node *root) {
    if(root != NULL)
    {
        fprintf(stderr,"%s ", root->key);
        preOrder(root->lChild);
        preOrder(root->rChild);
    }

}

ReadString() ReadString()

char *
readString(FILE *fp)
    {
    int ch,index;
    char *buffer;
    int size = 512;

    /* advance to the double quote */

    skipWhiteSpace(fp);
    if (feof(fp)) return 0;

    ch = fgetc(fp);
    if (ch == EOF) return 0;

    /* allocate the buffer */

    buffer = allocateMsg(size,"readString");

    if (ch != '\"')
        {
        fprintf(stderr,"SCAN ERROR: attempt to read a string failed\n");
        fprintf(stderr,"first character was <%c>\n",ch);
        exit(4);
        }

    /* toss the double quote, skip to the next character */

    ch = fgetc(fp);

    /* initialize the buffer index */

    index = 0;

    /* collect characters until the closing double quote */

    while (ch != '\"')
        {
        if (ch == EOF)
            {
            fprintf(stderr,"SCAN ERROR: attempt to read a string failed\n");
            fprintf(stderr,"no closing double quote\n");
            exit(6);
            }
        if (index > size - 2) 
            {
            ++size;
            buffer = reallocateMsg(buffer,size,"readString");
            }

        if (ch == '\\')
            {
            ch = fgetc(fp);
            if (ch == EOF)
                {
                fprintf(stderr,"SCAN ERROR: attempt to read a string failed\n");
                fprintf(stderr,"escaped character missing\n");
                exit(6);
                }
            buffer[index] = convertEscapedChar(ch);
            }
        else
            buffer[index] = ch;
        ++index;
        ch = fgetc(fp);
        }

    buffer[index] = '\0';

    return buffer;
    }

INPUT: Commands.txt INPUT:Commands.txt

i "Willy Wonka's Chochlate Factory"

INPUT testFile.txt INPUT testFile.txt

a b c d e f g h i j k l m n o p q r s t u v w x y z

Thanks! 谢谢!

You almost certainly have a buffer overrun in some part of the code that you're not showing us. 几乎可以肯定,在您未向我们展示的代码的某些部分中存在缓冲区溢出。 If I were to guess, I'd say you allocate too little storage for token to contain the full string you're writing into it in the first place. 如果我猜测,我会说你为token分配的存储空间太小,以至于包含你首先写入的完整字符串。

Did you by any chance allocate token using the same erroneous code you have in removeInvalid() : 您是否有机会使用removeInvalid()的相同错误代码分配token

malloc(sizeof(100) + 1);
       ^^^^^^^^^^^ this doesn't allocate 101 characters, it allocates sizeof(int)+1
char *readHelper(FILE *in){
    char * token = malloc(sizeof(char *) + 1);
    if (stringPending(in)){
        token = readString(in);
    }
    else {
        token = readToken(in);
    }
    return token;
}

It's hard to make sense of this without being able to see readString or readToken , but this can't possibly be right. 如果没有能够看到readStringreadToken ,很难理解这readToken ,但这可能不对。

First, you allocate one more byte than is needed for a pointer to one or more characters. 首先,为指向一个或多个字符的指针分配一个字节。 What use would such a thing be? 这样的事情会有什么用处? If you're not storing a pointer to one or more characters, why use sizeof(char *) ? 如果您没有存储指向一个或多个字符的指针,为什么要使用sizeof(char *) If you are storing a pointer to one or more characters, why add one? 如果要存储指向一个或多个字符的指针,为什么要添加一个? It's hard to imagine the reasoning that lead to that line of code. 很难想象导致这一行代码的推理。

Then, in the if , you immediately lose the value you got back from malloc because you overwrite token by using it to store something else. 然后,在if ,你会立即失去从malloc返回的值,因为你通过使用它来存储其他内容来覆盖token If you weren't going to use the value you assigned to token , why did you assign it at all? 如果您不打算使用分配给token的值,为什么要分配它?

Bluntly, a lot of this code simply doesn't make any sense. 坦率地说,很多代码根本没有任何意义。 Without comments, it's hard to understand the reasoning so we could point out what's wrong with it. 没有评论,很难理解推理,所以我们可以指出它有什么问题。

Either there was reasoning behind that line of code, in which case it's just completely wrong reasoning. 要么在这行代码背后有推理,在这种情况下它只是完全错误的推理。 Or worse, the line of code was added with no reasoning in the hopes it would work somehow. 或者更糟糕的是,代码行没有任何推理,希望它能以某种方式工作。 Neither method will produce working code. 这两种方法都不会产生工作代码。

When you're trying to debug code, first remove anything you added experimentally or that you didn't understand. 当您尝试调试代码时,首先删除您通过实验添加或不理解的任何内容。 If you do understand malloc(sizeof(char *) + 1) , then please explain what you think it does so that your understanding can be corrected. 如果你理解malloc(sizeof(char *) + 1) ,那么请解释你的想法,以便你的理解得到纠正。

Why did you think you needed a buffer that was one byte larger than the size of a pointer to one or more characters? 为什么你认为你需要一个比指向一个或多个字符的指针大一个字节的缓冲区?

char *turnToLowerCase(char *token){
    char *output = malloc(sizeof(*token) + 1);
    for (int x = 0; x < strlen(token); x++){
            output[x] = tolower(token[x]);
        }
    return output;
}

This is probably your main issue. 这可能是你的主要问题。 You allocate enough space for two characters and then proceed to store lots more than that. 您为两个字符分配了足够的空间,然后继续存储多个字符。 You probably wanted: 你可能想要:

    char *output = malloc(strlen(token) + 1);

Since token is a char* , *token is a char . 由于tokenchar**tokenchar So sizeof(*token) is sizeof(char) -- definitely not what you want. 所以sizeof(*token)sizeof(char) - 绝对不是你想要的。

With the help of David Schwartz and the other posters I was able to find the bug in my problem. 在David Schwartz和其他海报的帮助下,我能够找到问题中的错误。 When I was allocating memory for my token/output, I wasn't allocating enough space.. Using the erroneous code of 当我为我的令牌/输出分配内存时,我没有分配足够的空间..使用错误的代码

malloc(sizeof(100) + 1);

and

malloc(sizeof(*token) + 1);

both of which produced only a couple of bytes to be allocated. 两者都只产生了几个字节要分配。 This caused a buffer problem causing random letters and numbers/ truncation to happen. 这导致缓冲区问题导致随机字母和数字/截断发生。 The first resulting in the space equivalent to int + 1 and the second in char + 1. (as I was taking the sizeof token which is just the size of what it originally started as, a char ) 第一个导致空间等效于int + 1,第二个导致char + 1.(因为我正在使用sizeof 标记,它只是它最初开始的大小,一个字符

To fix this I changed the allocation of my token variable to that of 为了解决这个问题,我将令牌变量的分配更改为

malloc(strlen(token) + 1);

This allocates a space equivalent to the "string" length of token + 1. Allowing the appropriate space for my problem which would end up with space of <= token. 这会分配一个等于token + 1的“字符串”长度的空间。为我的问题留出适当的空间,最终会占用<= token的空间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM