简体   繁体   中英

How to convert string with escape sequence to one char in C

just to give you background. We have a school project where we need to write our own compiler in C. My task is to write a lexical analysis. So far so good but I am having some difficulties with escape sequences.

When I find an escape sequence and the escape sequence is correct I have it saved in a string which looks like this \\xAF otherwise it is lexical error.

My problem is how do I convert the string containing only escape sequence to one char? So I can add it to "buffer" containing the rest of the string.

I had an idea about a massive table containing only escape sequences and then comparing it one by one but it does not seem elegant.

This solution can be used for numerical escape sequences of all lengths and type, both octal, hexadecimal and others.

What you do when you see a '\\' is to check the next character. If it's a 'x' (or 'X' ) then you read one character, if it's a hexadecimal digit ( isxdigit ) then you read another. If the last is not a hexadecimal digit then put it back into the stream (an "unget" operation), and use only the first digit you read.

Each digit you read you put into a string, and then you can use eg strtol to convert that string into a number. Put that number directly into the token value.

For octal sequences, just up to three characters instead.


For an example of a similar method see this old lexer I made many years ago. Search for the lex_getescape function. Though this method uses direct arithmetic instead of strtoul to convert the escape code into a number, and not the standard isxdigit etc. functions either.

you can use the following code, call xString2char with your string.

char x2char(const char c)
{
    if (c >= '0' && c <= '9')
        return c - '0';
    if (c >= 'a' && c <= 'f')
        return c - 'a';
    if (c >= 'A' && c <= 'F')
        return c - 'A';
    //if we got here it's an error - handle it as you like...
}

char xString2char(const char* buf)
{
    char ans;
    ans = x2char(buf[2]);
    ans <<= 4;
    ans += x2char(buf[3]);
    return ans;
}

This should work, just add the error checking & handling (in case you didn't already validate them in your code)

flex has a start condition. This enables contextual analysis. For instance, there is an example for C comment analysis(between /* and */ ) in flex manual:

<INITIAL>"/*"   BEGIN(IN_COMMENT);
<IN_COMMENT>{
"*/"            BEGIN(INITIAL);
[^*\n]+         /* eat comment in chunks */
"*"             /* eat the lone star */
\n              yylineno++;
}

The start condition also enables string literal analysis. There is an example of how to match C-style quoted strings using start conditions in the item Start Conditions, and there is also FAQ item titled "How do I expand backslash-escape sequences in C-style quoted strings?" in flex manual. Probably this will answer your question.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM