简体   繁体   English

用re解析python文件

[英]parsing python file with re

I have a python file as 我有一个python文件

test.py 

import os
class test():

    def __init__(self):
        pass

    def add(num1, num2):
        return num1+num2

I am reading this file in a string as : 我正在读取以下字符串形式的文件:

with open('test.py', 'r') as myfile:
    data=myfile.read()

print data

Now, my data contains the string with all lines and new lines. 现在,我的数据包含所有行和换行的字符串。 I need to find lines with start of class and def. 我需要找到以课程开始和定义开始的行。

for example: 例如:

I need the output to be printed as : 我需要将输出打印为:

class test():
def __init__(self):
def add(num1, num2):

How can I process this using regular expressions? 如何使用正则表达式处理此问题?

So if you need to find all def and class lines it is much easier to avoid regex. 因此,如果您需要查找所有defclass行,则避免正则表达式要容易得多。

You read the whole content of the file here 您在这里阅读了文件的全部内容

with open('test.py', 'r') as myfile:
    data=myfile.read()

print data

Why don't you just find the answer right there? 您为什么不在那里找到答案呢?

with open('test.py', 'r') as myfile:
    for line in myfile:
        stripped = line.strip()  # get rid of spaces left and right
        if stripped.startswith('def') or stripped.startswith('class'):
             print(line)

To work with a whole string as you requested: 要按要求使用整个字符串:

import re
with open('test.py', 'r') as myfile:
    data = myfile.read()

print(data)

print(re.findall("class.+\n|def.+\n",data))

As you can see from the comments this will match ''definied as bla bla' as well. 从评论中可以看到,这也将与“定义为bla bla”匹配。 So it is better to use 所以最好用

print(re.findall("class .+\n|def .+\n",data))

If you want to follow a regex approach, use 如果要遵循正则表达式方法,请使用

re.findall(r'(?m)^[ \t]*((?:class|def)[ \t].*)', data)

or 要么

re.findall(r'^[ \t]*((?:class|def)[ \t].*)', data, flags=re.M)

See regex demo 正则表达式演示

The point is that you should use ^ as the beginning of the line anchor (hence, (?m) at the start or re.M flag are necessary), then you match horizontal whitespaces (with [ \\t] ), then either class or def (with (?:class|def) ), and then again a space or tab and then 0+ chars other than a newline ( .* ). 关键是您应该使用^作为锚的开头(因此,必须在开始或re.M标志处加上(?m) ),然后再匹配水平空格(使用[ \\t] ),然后选择任一classdef (使用(?:class|def) ),然后再输入一个空格或制表符,然后再输入0+个除换行符( .* )以外的字符。

If you plan to also handle Unicode whitespace, you need to replace [ \\t] with [^\\S\\r\\n\\f\\v] (and use the re.UNICODE flag). 如果您还打算处理Unicode空格,则需要用[^\\S\\r\\n\\f\\v]替换[ \\t] [^\\S\\r\\n\\f\\v] (并使用re.UNICODE标志)。

Python demo : Python演示

import re
p = re.compile(r'^[ \t]*((?:class|def)[ \t].*)', re.MULTILINE)
s = "test.py \n\nimport os\nclass test():\n\n    def __init__(self):\n        pass\n\n    def add(num1, num2):\n        return num1+num2"
print(p.findall(s))
# => ['class test():', 'def __init__(self):', 'def add(num1, num2):']
with open('test.py', 'r') as myfile:
    data=myfile.read().split('\n')
    for line in data:
        if re.search("(\s+)?class ", line) or re.search("^\s+def ", line):
            print line

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM