简体   繁体   中英

Segmentation fault with yacc/bison

I am trying to write a simple HTTP request parser for a school assignment but I have this segmentation fault that I can't get rid of. I think that my production rules are ok. I have executed bison with tracing enabled and it always produces a segfault at part where it parses my header:

Reducing stack by rule 9 (line 59):
   $1 = token ID ()
   $2 = token COLON ()
   $3 = token STRING ()
[4]    36661 segmentation fault (core dumped)  ./problem1 < input.txt

Here is the content of my request.l file:

%option noyywrap
%{
    #include<stdio.h>
    #include "request.tab.h"
    char *strclone(char *str);
%}

num                                     [0-9]+(\.[0-9]{1,2})?
letter                                  [a-zA-Z]
letternum                               [a-zA-Z0-9\-]
id                                      {letter}{letternum}*
string                                  \"[^"]*\"
fieldvalue                              {string}|{num}

%%

(GET|HEAD|POST|PUT|DELETE|OPTIONS)      { yylval = strclone(yytext); return METHOD; }
HTTP\/{num}                             { yylval = strclone(yytext); return VERSION; }
{id}                                    { yylval = strclone(yytext); return ID; }
"/"                                     { return SLASH; }
"\n"                                    { return NEWLINE; }
{string}                                { yylval = strclone(yytext); return STRING; }
":"                                     { return COLON; }
[ \t\n]+                                       ;
. {
    printf("Unexpected: %c\nExiting...\n", *yytext);
    exit(0);
}

%%

char *strclone(char *str) {
    int len = strlen(str);
    char *clone = (char *)malloc(sizeof(char)*(len+1));
    strcpy(clone,str);
    return clone;
}

and my request.y file:

%{
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#define YYSTYPE char*

extern int yylex();
extern int yyparse();
extern FILE* yyin;

void yyerror(const char* s);
%}

%token METHOD
%token SLASH
%token VERSION
%token STRING
%token ID
%token COLON
%token NEWLINE

%%

REQUEST: METHOD URI VERSION NEWLINE HEADERS {
       printf("%s %s", $1, $2);
    }
;

URI: SLASH DIR {
        $$ = (char *)malloc(sizeof(char)*(1+strlen($2)+1));
        sprintf($$, "//%s", $2);
    }
;

DIR: ID SLASH {
        $$ = (char *)malloc(sizeof(char)*(strlen($1)+2));
        sprintf($$, "%s//", $1);
    }
    |ID {
        $$ = $1;
    }
    | {
        $$ = "";
    }
;

HEADERS: HEADER {
        $$ = $1;
    }
    |HEADER NEWLINE HEADERS {
        $$ = (char *)malloc(sizeof(char)*(strlen($1)+1+strlen($3)+1));
        sprintf($$, "%s\n%s", $1, $3);
    }
    |{
        $$ = "";
    }
;

HEADER: ID COLON STRING {
        $$ = (char *)malloc(sizeof(char)*(strlen($1)+1+strlen($2)+1));
        sprintf($$, "%s:%s", $1, $3);
    }
;

%%

void yyerror (char const *s) {
   fprintf(stderr, "Poruka nije tacna\n");
}

int main() {
    yydebug = 1;
    yyin = stdin;

    do {
        yyparse();
    } while(!feof(yyin));

    return 0;
}

Also here is the content of my input.txt I am passing in as input:

GET / HTTP/1.1
Host: "developer.mozzila.org"
Accept-language: "fr"
HEADER: ID COLON STRING {
    $$ = (char *)malloc(sizeof(char)*(strlen($1)+1+strlen($2)+1));
    sprintf($$, "%s:%s", $1, $3);
};

Shouldn't you use strlen($3) in the expression, where you calculate the length of the combined string? strlen($2) as you use will only return the length of the colon string which should be 1. If you then sprintf to the buffer which is too short, you access the buffer behind it's length.

In request.y , you include the directive

#define YYSTYPE char*

So in the parser code generated by Bison, yylval is of type char* . But that line is not inserted into request.l . So in the scanner code generated by Flex, yylval has its default type, int .

C, unfortunately, allows a pointer to be converted to an integer type even if the integer type is too narrow to hold the entire address, which is the case with a typical 64-bit platform with 8-byte pointers and 4-byte int . So in your scanner, setting the value of what the compiler thinks is a four-byte int to an eight-byte pointer means that the value will be truncated. So when the parser attempts to use it as an address, you'll get a segfault. If you're lucky.

Most C compilers will warn you about this truncation -- but only if you tell the compiler that you want to see warnings ( -Wall for clang and gcc). Compiling with -Wall is always important, even when compiling the output of a code generator.

You also need to fix the typo noted by @JakobStark .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM