[英]Lexical analysis
I am learning lexers in Python. 我正在用Python学习词法分析器。 I am using Ply library for lexical analysis on some strings.
我正在使用Ply库对某些字符串进行词法分析。 I have implemented the following lexical analyzer for some of C++ language syntax.
我已经为某些C ++语言语法实现了以下词法分析器。
However, I am facing a strange behavior. 但是,我面临一个奇怪的行为。 When I define the COMMENT
states function definitions
at the end of other function definitions, the code works fine. 当我在其他函数定义的末尾定义COMMENT
states function definitions
时,代码可以正常工作。 If I define COMMENT state functions
before other definitions, I get errors as soon as //
sectoin starts in the input string starts. 如果我在其他定义之前定义COMMENT
state functions
,则//
输入字符串中的sectoin开始时就会出现错误。
WHAT IS THE REASON BEHIND THAT? 这是什么原因?
import ply.lex as lex
tokens = (
'DLANGLE', # <<
'DRANGLE', # >>
'EQUAL', # =
'STRING', # "144"
'WORD', # 'Welcome' in "Welcome."
'SEMICOLON', # ;
)
t_ignore = ' \t\v\r' # shortcut for whitespace
states = (
('cppcomment', 'exclusive'), # <!--
)
def t_cppcomment(t): # definition here causes errors
r'//'
print 'MyCOm:',t.value
t.lexer.begin('cppcomment');
def t_cppcomment_end(t):
r'\n'
t.lexer.begin('INITIAL');
def t_cppcomment_error(t):
print "Error FOUND"
t.lexer.skip(1)
def t_DLANGLE(t):
r'<<'
print 'MyLAN:',t.value
return t
def t_DRANGLE(t):
r'>>'
return t
def t_SEMICOLON(t):
r';'
print 'MySemi:',t.value
return t;
def t_EQUAL(t):
r'='
return t
def t_STRING(t):
r'"[^"]*"'
t.value = t.value[1:-1] # drop "surrounding quotes"
print 'MyString:',t.value
return t
def t_WORD(t):
r'[^ <>\n]+'
print 'MyWord:',t.value
return t
webpage = "cout<<\"Hello World\"; // this comment"
htmllexer = lex.lex()
htmllexer.input(webpage)
while True:
tok = htmllexer.token()
if not tok: break
print tok
Regards 问候
Just figured it out. 只是想通了。 As I have defined comment state as
exclusive
, it won't use the inclusive
state modules (if comment modules are defined at the top, otherwise it uses it for some reason). 正如我将注释状态定义为
exclusive
,它不会使用inclusive
状态模块(如果注释模块在顶部定义,否则出于某种原因会使用它)。 So you will have redefine all the modules for comment state again. 因此,您将再次为注释状态重新定义所有模块。 Therefore ply provides error() modules for skipping characters for which specific modules are not defined.
因此, ply提供了error()模块,用于跳过未定义特定模块的字符。
its because you have no rules that accept this
or comment
and really you dont care about whats in the comment you can easilly do something like 它的,因为你没有规则,接受
this
或comment
,真的你不关于什么的评论,你可以easilly这样做护理
t_cppcomment_ANYTHING = '[^\r\n]'
just below your t_ignore
rule 低于您的
t_ignore
规则
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.