简体   繁体   English

使用flex识别变量名不重复字符

[英]Using flex to identify variable name without repeating characters

I'm not fully sure how to word my question, so sorry for the rough title.我不太确定如何表达我的问题,很抱歉粗略的标题。

I am trying to create a pattern that can identify variable names with the following restraints:我正在尝试创建一种模式,该模式可以识别具有以下限制的变量名称:

  • Must begin with a letter必须以字母开头
  • First letter may be followed by any combination of letters, numbers, and hyphens第一个字母后面可以跟字母、数字和连字符的任意组合
  • First letter may be followed with nothing第一个字母后面可以没有任何内容
  • The variable name must not be entirely X's ([xX]+ is a seperate identifier in this grammar)变量名不能完全是 X([xX]+ 在这个语法中是一个单独的标识符)

So for example, these would all be valid:因此,例如,这些都是有效的:

  • Avariable123变量123
  • Bee-keeper养蜂人
  • Y
  • E-3 E-3

But the following would not be valid:但以下内容无效:

  • XXXX XXXX
  • X X
  • 3variable 3变量
  • 5 5个

I am able to meet the first three requirements with my current identifier, but I am really struggling to change it so that it doesn't pick up variables that are entirely the letter X.我可以用我当前的标识符满足前三个要求,但我真的很难改变它,这样它就不会选择完全是字母 X 的变量。

Here is what I have so far: [az][a-z0-9\-]* {return (NAME);}这是我到目前为止所拥有的: [az][a-z0-9\-]* {return (NAME);}

Can anyone suggest a way of editing this to avoid variables that are made up of just the letter X?任何人都可以建议一种编辑方法以避免仅由字母 X 组成的变量吗?

The easiest way to handle that sort of requirement is to have one pattern which matches the exceptional string and another pattern, which comes afterwards in the file, which matches all the strings:处理这种要求的最简单方法是使用一种模式来匹配异常字符串,另一种模式随后出现在文件中,它匹配所有字符串:

[xX]+                    { /* matches all-x tokens */ }
[[:alpha:]][[:alnum:]-]* { /* handle identifiers */ }

This works because lex (and almost all lex derivatives) select the first match if two patterns match the same longest token.这是有效的,因为如果两个模式匹配相同的最长标记,则 lex(以及几乎所有 lex 派生词)select 是第一个匹配项。

Of course, you need to know what you want to do with the exceptional symbol.当然,您需要知道要使用特殊符号做什么。 If you just want to accept it as some token type, there's no problem;如果你只是想接受它作为某种令牌类型,那没问题; you just do that.你只是那样做。 If, on the other hand, the intention was to break it into subtokens, perhaps individual letters, then you'll have to use yyless() , and you might want to switch to a new lexing state in order to avoid repeatedly matching the same long sequence of X s.另一方面,如果打算将其分解为子标记,也许是单个字母,那么您将不得不使用yyless() ,并且您可能希望切换到新的词法分析 state 以避免重复匹配相同的X的长序列。 But maybe that doesn't matter in your case.但也许这对你的情况并不重要。

See the flex manual for more details and examples.有关详细信息和示例,请参阅flex 手册

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM