简体   繁体   English

如何将带有转义序列的字符串转换为C中的一个char

[英]How to convert string with escape sequence to one char in C

just to give you background. 只是为了给你背景。 We have a school project where we need to write our own compiler in C. My task is to write a lexical analysis. 我们有一个学校项目,我们需要在C中编写自己的编译器。我的任务是编写词法分析。 So far so good but I am having some difficulties with escape sequences. 到目前为止这么好,但我在逃脱序列方面遇到了一些困难。

When I find an escape sequence and the escape sequence is correct I have it saved in a string which looks like this \\xAF otherwise it is lexical error. 当我找到转义序列并且转义序列是正确的时,我将它保存在一个看起来像这个\\ xAF的字符串中,否则它是词法错误。

My problem is how do I convert the string containing only escape sequence to one char? 我的问题是如何将仅包含转义序列的字符串转换为一个char? So I can add it to "buffer" containing the rest of the string. 所以我可以将它添加到包含其余字符串的“缓冲区”中。

I had an idea about a massive table containing only escape sequences and then comparing it one by one but it does not seem elegant. 我对一个只包含转义序列的大型表有一个想法,然后逐个比较,但它看起来并不优雅。

This solution can be used for numerical escape sequences of all lengths and type, both octal, hexadecimal and others. 该解决方案可用于所有长度和类型的数字转义序列,包括八进制,十六进制等。

What you do when you see a '\\' is to check the next character. 当你看到'\\'时你所做'\\'就是检查下一个字符。 If it's a 'x' (or 'X' ) then you read one character, if it's a hexadecimal digit ( isxdigit ) then you read another. 如果它是'x' (或'X' ),那么你读一个字符,如果它是一个十六进制数字( isxdigit ),那么你读另一个字符。 If the last is not a hexadecimal digit then put it back into the stream (an "unget" operation), and use only the first digit you read. 如果last 不是十六进制数字,则将其放回流中(“unget”操作),并仅使用您读取的第一个数字。

Each digit you read you put into a string, and then you can use eg strtol to convert that string into a number. 您读取的每个数字都放入一个字符串,然后您可以使用例如strtol将该字符串转换为数字。 Put that number directly into the token value. 将该数字直接放入令牌值。

For octal sequences, just up to three characters instead. 对于八进制序列,最多只能有三个字符。


For an example of a similar method see this old lexer I made many years ago. 有关类似方法的示例,请参阅我多年前制作的这个老词法 Search for the lex_getescape function. 搜索lex_getescape函数。 Though this method uses direct arithmetic instead of strtoul to convert the escape code into a number, and not the standard isxdigit etc. functions either. 虽然这种方法使用直接算法而不是strtoul将转义码转换为数字,而不是标准的isxdigit等函数。

you can use the following code, call xString2char with your string. 您可以使用以下代码,使用您的字符串调用xString2char。

char x2char(const char c)
{
    if (c >= '0' && c <= '9')
        return c - '0';
    if (c >= 'a' && c <= 'f')
        return c - 'a';
    if (c >= 'A' && c <= 'F')
        return c - 'A';
    //if we got here it's an error - handle it as you like...
}

char xString2char(const char* buf)
{
    char ans;
    ans = x2char(buf[2]);
    ans <<= 4;
    ans += x2char(buf[3]);
    return ans;
}

This should work, just add the error checking & handling (in case you didn't already validate them in your code) 这应该工作,只需添加错误检查和处理(如果您还没有在代码中验证它们)

flex has a start condition. flex有一个start条件。 This enables contextual analysis. 这样可以进行上下文分析。 For instance, there is an example for C comment analysis(between /* and */ ) in flex manual: 例如,在flex手册中有一个C注释分析的例子(在/**/ ):

<INITIAL>"/*"   BEGIN(IN_COMMENT);
<IN_COMMENT>{
"*/"            BEGIN(INITIAL);
[^*\n]+         /* eat comment in chunks */
"*"             /* eat the lone star */
\n              yylineno++;
}

The start condition also enables string literal analysis. 启动条件还启用字符串文字分析。 There is an example of how to match C-style quoted strings using start conditions in the item Start Conditions, and there is also FAQ item titled "How do I expand backslash-escape sequences in C-style quoted strings?" 有一个如何在项目开始条件中使用开始条件匹配C风格引用字符串的示例,还有一个标题为"How do I expand backslash-escape sequences in C-style quoted strings?" FAQ项目"How do I expand backslash-escape sequences in C-style quoted strings?" in flex manual. 在flex手册中。 Probably this will answer your question. 可能这会回答你的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM