[英]JFlex maximum read length
Given a positional language like the old IBM RPG , we can have a line such as给定一种像旧的IBM RPG这样的位置语言,我们可以有这样一行:
CCCCCDIDENTIFIER E S 10
Where characters哪里的字符
1-5: comment
6: specification type
7-21: identifier name
...And so on
Now, given that JFlex is based on RegExp, we would have a RegExp such as:现在,鉴于 JFlex 基于 RegExp,我们将有一个 RegExp,例如:
[a-zA-Z][a-zA-Z0-9]{0,14} {0,14}
for the identifier name
token.对于
identifier name
令牌。
This RegExp however can match tokens longer than the 15 characters possible for identifier name
, requiring yypushback
s.但是,此 RegExp 可以匹配比
identifier name
可能的 15 个字符更长的标记,需要yypushback
。
Thus, is there a way to limit how many characters JFlex reads for a particular token?因此,有没有办法限制 JFlex 为特定标记读取的字符数?
Regular expression based lexical analysis is really not the right tool to parse fixed-field inputs.基于正则表达式的词法分析确实不是解析固定字段输入的正确工具。 You can just split the input into fields at the known character positions, which is way easier and a lot faster.
您可以将输入拆分为已知字符位置的字段,这更容易也更快。 And it doesn't require fussing with regular expressions.
而且它不需要大惊小怪的正则表达式。
Anyway, [a-zA-Z][a-zA-Z0-9]{0,14}[ ]{0,14}
wouldn't be the right expression even if it did properly handle the token length, since the token is really the word at the beginning, without space characters.无论如何,即使
[a-zA-Z][a-zA-Z0-9]{0,14}[ ]{0,14}
确实正确处理了令牌长度,它也不是正确的表达式,因为令牌是真正的单词开头,没有空格字符。
In the case of fixed-length fields which contain something more complicated than a single identifier, you might want to feed the field into a lexer, using a StringReader or some such.对于包含比单个标识符更复杂的内容的固定长度字段,您可能希望使用 StringReader 或类似的方法将该字段输入到词法分析器中。
Although I'm sure it's not useful, here's a regular expression which matches 15 characters which start with a word and are completed with spaces:虽然我确定它没有用,但这里有一个正则表达式,它匹配 15 个以单词开头并以空格结尾的字符:
[a-zA-Z][ ]{14} |
[a-zA-Z][a-zA-Z0-9][ ]{13} |
[a-zA-Z][a-zA-Z0-9]{2}[ ]{12} |
[a-zA-Z][a-zA-Z0-9]{3}[ ]{11} |
[a-zA-Z][a-zA-Z0-9]{4}[ ]{10} |
[a-zA-Z][a-zA-Z0-9]{5}[ ]{9} |
[a-zA-Z][a-zA-Z0-9]{6}[ ]{8} |
[a-zA-Z][a-zA-Z0-9]{7}[ ]{7} |
[a-zA-Z][a-zA-Z0-9]{8}[ ]{6} |
[a-zA-Z][a-zA-Z0-9]{9}[ ]{5} |
[a-zA-Z][a-zA-Z0-9]{10}[ ]{4} |
[a-zA-Z][a-zA-Z0-9]{11}[ ]{3} |
[a-zA-Z][a-zA-Z0-9]{12}[ ]{2} |
[a-zA-Z][a-zA-Z0-9]{13}[ ] |
[a-zA-Z][a-zA-Z0-9]{14}
(That might have to be put on one very long line.) (这可能必须放在一条很长的线上。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.