简体   繁体   English

JFlex 最大读取长度

[英]JFlex maximum read length

Given a positional language like the old IBM RPG , we can have a line such as给定一种像旧的IBM RPG这样的位置语言,我们可以有这样一行:

CCCCCDIDENTIFIER     E S             10

Where characters哪里的字符

 1-5:  comment
   6:  specification type
7-21:  identifier name
...And so on

Now, given that JFlex is based on RegExp, we would have a RegExp such as:现在,鉴于 JFlex 基于 RegExp,我们将有一个 RegExp,例如:

[a-zA-Z][a-zA-Z0-9]{0,14} {0,14}

for the identifier name token.对于identifier name令牌。
This RegExp however can match tokens longer than the 15 characters possible for identifier name , requiring yypushback s.但是,此 RegExp 可以匹配比identifier name可能的 15 个字符更长的标记,需要yypushback

Thus, is there a way to limit how many characters JFlex reads for a particular token?因此,有没有办法限制 JFlex 为特定标记读取的字符数?

Regular expression based lexical analysis is really not the right tool to parse fixed-field inputs.基于正则表达式的词法分析确实不是解析固定字段输入的正确工具。 You can just split the input into fields at the known character positions, which is way easier and a lot faster.您可以将输入拆分为已知字符位置的字段,这更容易也更快。 And it doesn't require fussing with regular expressions.而且它不需要大惊小怪的正则表达式。

Anyway, [a-zA-Z][a-zA-Z0-9]{0,14}[ ]{0,14} wouldn't be the right expression even if it did properly handle the token length, since the token is really the word at the beginning, without space characters.无论如何,即使[a-zA-Z][a-zA-Z0-9]{0,14}[ ]{0,14}确实正确处理了令牌长度,它也不是正确的表达式,因为令牌是真正的单词开头,没有空格字符。

In the case of fixed-length fields which contain something more complicated than a single identifier, you might want to feed the field into a lexer, using a StringReader or some such.对于包含比单个标识符更复杂的内容的固定长度字段,您可能希望使用 StringReader 或类似的方法将该字段输入到词法分析器中。


Although I'm sure it's not useful, here's a regular expression which matches 15 characters which start with a word and are completed with spaces:虽然我确定它没有用,但这里有一个正则表达式,它匹配 15 个以单词开头并以空格结尾的字符:

[a-zA-Z][ ]{14} |
[a-zA-Z][a-zA-Z0-9][ ]{13} |
[a-zA-Z][a-zA-Z0-9]{2}[ ]{12} |
[a-zA-Z][a-zA-Z0-9]{3}[ ]{11} |
[a-zA-Z][a-zA-Z0-9]{4}[ ]{10} |
[a-zA-Z][a-zA-Z0-9]{5}[ ]{9} |
[a-zA-Z][a-zA-Z0-9]{6}[ ]{8} |
[a-zA-Z][a-zA-Z0-9]{7}[ ]{7} |
[a-zA-Z][a-zA-Z0-9]{8}[ ]{6} |
[a-zA-Z][a-zA-Z0-9]{9}[ ]{5} |
[a-zA-Z][a-zA-Z0-9]{10}[ ]{4} |
[a-zA-Z][a-zA-Z0-9]{11}[ ]{3} |
[a-zA-Z][a-zA-Z0-9]{12}[ ]{2} |
[a-zA-Z][a-zA-Z0-9]{13}[ ] |
[a-zA-Z][a-zA-Z0-9]{14}

(That might have to be put on one very long line.) (这可能必须放在一条很长的线上。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM