在Lex中使用Regex中的Unicode Range作为规则

Question

import re
import ply.lex as lex

#rest of the code

def t_WORD(t): #WORD is a token defined in the tokens tuple
    r'[\u0C80-\u0CFF]+'
    #rest of the actions

This snippet provides an error stating illegal character. 此摘要提供了一个错误，指出了非法字符。 All characters are in the unicode range specified in the regex rule. 所有字符都在正则表达式规则中指定的unicode范围内。

What can be the problem? 可能是什么问题？ Thanks in advance. 提前致谢。

Answer 1

The lexer should work properly with both Unicode given as token and pattern matching rules. 该词法分析器应与作为标记和模式匹配规则给出的Unicode一起正常工作。 If you need to supply optional flags to the re.compile() function, use the reflags option to lex. 如果需要向re.compile（）函数提供可选标志，请对lex使用reflags选项。

lex.lex(reflags=re.UNICODE)

As alternative, see How to validate kannada words and Python Lex-Yacc 或者，请参阅如何验证卡纳达语单词和Python Lex-Yacc

在Lex中使用Regex中的Unicode Range作为规则

问题描述

1 个解决方案

解决方案1
1 已采纳 2013-11-16 08:01:33

在Lex中使用Regex中的Unicode Range作为规则

问题描述

1 个解决方案

解决方案1 1 已采纳 2013-11-16 08:01:33

解决方案1
1 已采纳 2013-11-16 08:01:33