简体   繁体   English

python中的C词法分析器

[英]C Lexical analyzer in python

I'm creating a C Lexical analyzer using python as part of developing a parser.Here in my code i have written some methods for identifying keywords,numbers,operators etc. No error is shown after compiling. 我正在使用python创建C词法分析器,这是开发解析器的一部分。在我的代码中,我编写了一些用于标识关键字,数字,运算符等的方法。编译后未显示任何错误。 While executing i could input a .c file.My output should list all the keywords,identifiers etc in the input file. 在执行时,我可以输入一个.c文件。我的输出应在输入文件中列出所有关键字,标识符等。 But it is not showing anything .Can anyone help me with this. 但这没有显示任何内容。任何人都可以帮我这个忙。 The code is attached. 该代码已附加。

import sys
import string
delim=['\t','\n',',',';','(',')','{','}','[',']','#','<','>']
oper=['+','-','*','/','%','=','!']
key=["int","float","char","double","bool","void","extern","unsigned","goto","static","class","struct","for","if","else","return","register","long","while","do"]
predirect=["include","define"]
header=["stdio.h","conio.h","malloc.h","process.h","string.h","ctype.h"]
word_list1=""
i=0
j=0
f=0
numflag=0
token=[0]*50


def isdelim(c):
    for k in range(0,14):
        if c==delim[k]:
            return 1
        return 0

def isop(c):
    for k in range(0,7):
        if c==oper[k]:
            ch=word_list1[i+1]
            i+=1
            for j in range(0,6):
                if ch==oper[j]:
                    fop=1
                    sop=ch
                    return 1
                #ungetc(ch,fp);
                return 1
                j+=1
        return 0;
        k+=1

def check(t):
    print t
    if numflag==1:
        print "\n number "+str(t)
        return
    for k in range(0,2):#(i=0;i<2;i++)
        if strcmp(t,predirect[k])==0:
            print "\n preprocessor directive "+str(t)
            return
    for k in range(0,6): #=0;i<6;i++)
        if strcmp(t,header[k])==0:
            print "\n header file "+str(t)
            return
    for k in range(0,21): #=0;i<21;i++)
        if strcmp(key[k],t)==0:
            print "\n keyword "+str(key[k])
            return
        print "\n identifier \t%s"+str(t)

def skipcomment():
    ch=word_list[i+1]
    i+=1
    if ch=='/':
        while word_list1[i]!='\0':
            i+=1#ch=getc(fp))!='\0':
    elif ch=='*':
        while f==0:
            ch=word_list1[i]
            i+=1
        if c=='/':
            f=1
    f=0




a=raw_input("Enter the file name:")
s=open(a,"r")
str1=s.read()
word_list1=str1.split()




i=0
#print word_list1[i]
for word in word_list1 :
    print word_list1[i]
    if word_list1[i]=="/":
        print word_list1[i]
    elif word_list1[i]==" ":
        print word_list1[i]
    elif word_list1[i].isalpha():
        if numflag!=1:
            token[j]=word_list1[i]
            j+=1
        if numflag==1:
            token[j]='\0'
            check(token)
            numflag=0
            j=0
            f=0
        if f==0:
            f=1
    elif word_list1[i].isalnum():
        if numflag==0:
            numflag=1
            token[j]=word_list1[i]
            j+=1
        else:
            if isdelim(word_list1[i]):
                if numflag==1:
                    token[j]='\0'
                    check(token)
                    numflag=0
                if f==1:
                    token[j]='\0'
                    numflag=0
                    check(token)
                j=0
                f=0
                print "\n delimiters : "+word_list1[i]
    elif isop(word_list1[i]):
        if numflag==1:
            token[j]='\0'
            check(token)
            numflag=0
            j=0
            f=0
        if f==1:
            token[j]='\0'
            j=0 
            f=0
            numflag=0
            check(token)    
        if fop==1:
            fop=0
            print "\n operator \t"+str(word_list1[i])+str(sop)
        else:
            print "\n operator \t"+str(c)
    elif word_list1[i]=='.':
        token[j]=word_list1[i]
        j+=1
    i+=1

Your code is bad. 您的代码是错误的。 Try splitting it up into smaller functions that you can test individually. 尝试将其拆分为可以单独测试的较小功能。 Have you tried debugging the program? 您是否尝试过调试程序? Once you find the place that causes the problem, you can come back here and ask a more specific question. 找到导致问题的地方后,您可以返回此处并提出更具体的问题。

Some more hints. 更多提示。 You can implement isdelim much simpler like this: 您可以像这样实现isdelim

def isdelim(c):
    return c in delim

To compare string for equality, use string1 == string2 . 要比较string是否相等,请使用string1 == string2 strcmp does not exist in Python. Python中不存在strcmp I do not know if you are aware that Python is usually interpreted and not compiled. 我不知道您是否知道Python通常是解释的而不是编译的。 This means that you will get no compiler-error if you call a function that does not exist. 这意味着,如果调用不存在的函数,则不会出现编译器错误。 The program will only complain at run-time when it reaches the call. 程序仅在到达调用时在运行时进行抱怨。

In your function isop you have unreachable code. 在函数isop您有不可访问的代码。 The lines j += 1 and k += 1 can never be reached as they are right after a return statement. return语句之后, j += 1k += 1行将永远无法到达。

In Python iterating over a collection is done like this: 在Python中,对集合的迭代是这样完成的:

for item in collection:
    # do stuff with item

These are just some hints. 这些只是一些提示。 You should really read the Python Tutorial . 您应该真正阅读Python教程

def isdelim(c):
    if c in delim:
        return 1
    return 0

You should learn more about Python basics. 您应该了解有关Python基础的更多信息。 ATM, your code contains too much if s and for s. 在ATM上, if s和for s包含太多代码。

Try learning it the hard way . 尝试用困难的方式学习它。

It seems to print out quite a bit of output for me, but the code is pretty hard to follow. 似乎为我输出了很多输出,但是代码很难遵循。 I ran it against itself and it errored out like so: 我对它自己进行了测试,并出现如下错误:

Traceback (most recent call last):
  File "C:\dev\snippets\lexical.py", line 92, in <module>
    token[j]=word_list1[i]
IndexError: list assignment index out of range

Honestly, this is pretty bad code. 老实说,这是非常糟糕的代码。 You should give the functions better names and don't use magic numbers like this: 您应该给函数更好的名称,并且不要使用像这样的幻数:

for k in range(0,14)

I mean, you have already made a list you can use for the range. 我的意思是,您已经列出了可用于范围的列表。

for k in range(delim)

Makes slightly more sense. 更加有意义。

But you're just trying to determine if c is in the list delim, so just say: 但是,您只是想确定c是否在列表delim中,所以只需说:

if c in delim

Why are you returning 1 and 0, what do they mean? 为什么返回1和0,它们是什么意思? Why not use True and False. 为什么不使用True和False。

There are probably several other blatantly obvious problems, like the whole "main" section of the code. 可能还有其他一些明显的问题,例如代码的整个“主要”部分。

This is not very pythonic: 这不是很pythonic:

token=[0]*50

Do you really just mean to say? 你真的是想说吗?

token = []

Now it's just an empty list. 现在,这只是一个空列表。

Instead of trying to use a counter like this: 而不是尝试使用这样的计数器:

token[j]=word_list1[i]

You want to append, like this: 您想要添加,像这样:

token.append (word_list[i])

I honestly think you've started with too hard a problem. 老实说,我认为您从一个难题开始。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM