简体繁体 English

如何忽略 Lex 中的字符？

[英]How do I ignore characters in Lex?

原文 2020-06-26 18:06:23 8 1 c/ lex

Is it possible to completely ignore certain characters in Lex?是否可以完全忽略 Lex 中的某些字符？ Any regex excluding characters will break apart the tokens where those characters occur rather than completely ignoring them.任何不包括字符的正则表达式都会分解出现这些字符的标记，而不是完全忽略它们。 I am aware of the semicolon rule wherein the text is ignored;我知道忽略文本的分号规则； however, including a regex later on that accepts any characters still accepts characters previously declared to be ignored.但是，稍后包含一个接受任何字符的正则表达式仍然接受先前声明为被忽略的字符。 Having that regex ignore those characters causes it to break the token when it meets them instead of skipping past them.让该正则表达式忽略这些字符会导致它在遇到它们时破坏令牌，而不是跳过它们。

1 个解决方案

Is it possible to completely ignore certain characters in Lex?是否可以完全忽略 Lex 中的某些字符？

No, original AT&T lex utility does not have anything that would support this, nor does POSIX specify any such thing.不，原来的 AT&T lex实用程序没有任何支持这一点的东西，POSIX 也没有指定任何这样的东西。 Input is read from the specified stream, and matched directly against provided patterns.输入从指定的 stream 读取，并直接与提供的模式匹配。 Every character obtained from the input is subject to matching -- only before lex reads it in or after it tokenizes is there an opportunity to muck with character content.从输入中获得的每个字符都需要匹配——只有在lex读入它之前或在它标记化之后，才有机会弄脏字符内容。

It would be possible, but extremely messy to write a ruleset and corresponding actions that acted as if some specified character were completely ignored.编写一个规则集和相应的动作，就好像某些指定的字符被完全忽略一样，这是可能的，但非常混乱。 Instead, your best bet is to ensure that the characters in question are stripped out before lex ever sees them.相反，您最好的选择是确保在lex看到这些字符之前将其删除。

With traditional and POSIX lex , data are read from a stream designated to the lexer via global stream pointer yyin .使用传统和 POSIX lex ，通过全局 stream 指针yyin从指定给词法分析器的 stream 读取数据。 Standard C provides no mechanism for wrapping or internally filtering streams, but you could insert an external filter by having your program fork, with the child reading the original input data, stripping out the unwanted characters, and piping the rest to the parent process.标准 C 没有提供包装或内部过滤流的机制，但您可以通过让程序分叉插入外部过滤器，让子进程读取原始输入数据，去除不需要的字符，并将 rest 管道传输到父进程。 The parent, meanwhile, wraps the read end of the pipe in a stream ( fdopen() , for example), and assigns that to yyin .同时，父级将 pipe 的读取端包装在 stream （例如fdopen() ）中，并将其分配给yyin 。

On the other hand, if you use Flex instead of traditional lex then you have the alternative of redefining the YY_INPUT() macro to filter out the unwanted characters before they reach the scanner proper.另一方面，如果您使用 Flex 而不是传统的lex ，那么您可以选择重新定义YY_INPUT()宏以在不需要的字符到达扫描仪之前过滤掉它们。 This is lighter-weight than forking, and it can be expressed in the flex 's input file, rather than requiring the the program using the scanner to set up the filter.这比 fork 轻量级，并且可以在flex的输入文件中表示，而不需要使用扫描仪的程序来设置过滤器。

Either way, however, there is no built-in feature specifically for pretending that particular characters did not appear in the input at all.然而，无论哪种方式，都没有专门用于假装特定字符根本没有出现在输入中的内置功能。