如何将带有转义序列的字符串转换为C中的一个char

Question

just to give you background. 只是为了给你背景。 We have a school project where we need to write our own compiler in C. My task is to write a lexical analysis. 我们有一个学校项目，我们需要在C中编写自己的编译器。我的任务是编写词法分析。 So far so good but I am having some difficulties with escape sequences. 到目前为止这么好，但我在逃脱序列方面遇到了一些困难。

When I find an escape sequence and the escape sequence is correct I have it saved in a string which looks like this \\xAF otherwise it is lexical error. 当我找到转义序列并且转义序列是正确的时，我将它保存在一个看起来像这个\\ xAF的字符串中，否则它是词法错误。

My problem is how do I convert the string containing only escape sequence to one char? 我的问题是如何将仅包含转义序列的字符串转换为一个char？ So I can add it to "buffer" containing the rest of the string. 所以我可以将它添加到包含其余字符串的“缓冲区”中。

I had an idea about a massive table containing only escape sequences and then comparing it one by one but it does not seem elegant. 我对一个只包含转义序列的大型表有一个想法，然后逐个比较，但它看起来并不优雅。

Answer 1

This solution can be used for numerical escape sequences of all lengths and type, both octal, hexadecimal and others. 该解决方案可用于所有长度和类型的数字转义序列，包括八进制，十六进制等。

What you do when you see a '\\' is to check the next character. 当你看到'\\'时你所做'\\'就是检查下一个字符。 If it's a 'x' (or 'X' ) then you read one character, if it's a hexadecimal digit ( isxdigit ) then you read another. 如果它是'x' （或'X' ），那么你读一个字符，如果它是一个十六进制数字（ isxdigit ），那么你读另一个字符。 If the last is not a hexadecimal digit then put it back into the stream (an "unget" operation), and use only the first digit you read. 如果last 不是十六进制数字，则将其放回流中（“unget”操作），并仅使用您读取的第一个数字。

Each digit you read you put into a string, and then you can use eg strtol to convert that string into a number. 您读取的每个数字都放入一个字符串，然后您可以使用例如strtol将该字符串转换为数字。 Put that number directly into the token value. 将该数字直接放入令牌值。

For octal sequences, just up to three characters instead. 对于八进制序列，最多只能有三个字符。

For an example of a similar method see this old lexer I made many years ago. 有关类似方法的示例，请参阅我多年前制作的这个老词法。 Search for the lex_getescape function. 搜索lex_getescape函数。 Though this method uses direct arithmetic instead of strtoul to convert the escape code into a number, and not the standard isxdigit etc. functions either. 虽然这种方法使用直接算法而不是strtoul将转义码转换为数字，而不是标准的isxdigit等函数。

Answer 2

you can use the following code, call xString2char with your string. 您可以使用以下代码，使用您的字符串调用xString2char。

char x2char(const char c)
{
    if (c >= '0' && c <= '9')
        return c - '0';
    if (c >= 'a' && c <= 'f')
        return c - 'a';
    if (c >= 'A' && c <= 'F')
        return c - 'A';
    //if we got here it's an error - handle it as you like...
}

char xString2char(const char* buf)
{
    char ans;
    ans = x2char(buf[2]);
    ans <<= 4;
    ans += x2char(buf[3]);
    return ans;
}

This should work, just add the error checking & handling (in case you didn't already validate them in your code) 这应该工作，只需添加错误检查和处理（如果您还没有在代码中验证它们）

Answer 3

flex has a start condition. flex有一个start条件。 This enables contextual analysis. 这样可以进行上下文分析。 For instance, there is an example for C comment analysis(between /* and */ ) in flex manual: 例如，在flex手册中有一个C注释分析的例子（在/*和*/ ）：

<INITIAL>"/*"   BEGIN(IN_COMMENT);
<IN_COMMENT>{
"*/"            BEGIN(INITIAL);
[^*\n]+         /* eat comment in chunks */
"*"             /* eat the lone star */
\n              yylineno++;
}

The start condition also enables string literal analysis. 启动条件还启用字符串文字分析。 There is an example of how to match C-style quoted strings using start conditions in the item Start Conditions, and there is also FAQ item titled "How do I expand backslash-escape sequences in C-style quoted strings?" 有一个如何在项目开始条件中使用开始条件匹配C风格引用字符串的示例，还有一个标题为"How do I expand backslash-escape sequences in C-style quoted strings?" FAQ项目"How do I expand backslash-escape sequences in C-style quoted strings?" in flex manual. 在flex手册中。 Probably this will answer your question. 可能这会回答你的问题。

如何将带有转义序列的字符串转换为C中的一个char

问题描述

3 个解决方案

解决方案1
3 2012-11-18 14:44:43

解决方案2
2 已采纳 2012-11-18 14:45:47

解决方案3
-1 2012-11-18 14:44:14

如何将带有转义序列的字符串转换为C中的一个char

问题描述

3 个解决方案

解决方案1 3 2012-11-18 14:44:43

解决方案2 2 已采纳 2012-11-18 14:45:47

解决方案3 -1 2012-11-18 14:44:14

解决方案1
3 2012-11-18 14:44:43

解决方案2
2 已采纳 2012-11-18 14:45:47

解决方案3
-1 2012-11-18 14:44:14