简体   繁体   English

Python多行模式匹配

[英]Python multi-line pattern matching

I am trying to match a multiline pattern using a shell command through python. 我正在尝试通过python使用shell命令来匹配多行模式。

I am able to match using the shell commands but I am not able to pass this command through the Python subprocess.call or the os.system modules. 我可以使用shell命令进行匹配,但是无法通过Python subprocess.call或os.system模块传递此命令。

My file looks something like this: 我的文件如下所示:

(CELL
  (CELLTYPE "NAND_2X1")
  (INSTANCE U2)
  (DELAY
    (ABSOLUTE
    (IOPATH A1 ZN (0.02700::0.02700) (0.01012::0.01012))
    (IOPATH A2 ZN (0.02944::0.02944) (0.00930::0.00930))
    )
  )
)

No, I am trying to extract this: 不,我正在尝试提取以下内容:

  (INSTANCE U2)
  (DELAY
    (ABSOLUTE
    (IOPATH A1 ZN (0.02700::0.02700) (0.01012::0.01012))
    (IOPATH A2 ZN (0.02944::0.02944) (0.00930::0.00930))
    )
  )

using this regex: 使用此正则表达式:

pcregrep -M -n 'INSTANCE U2((?!^\)).*\n)+' sdf/c1_syn_buf2.sdf

wherein U2 is the search string and sdf/c1_syn_buf2.sdf is the file name 其中U2是搜索字符串,而sdf / c1_syn_buf2.sdf是文件名

In Python, I have defined a function to which I will pass the search string and the file name as I have to do this operation multiple times. 在Python中,我定义了一个函数,我将多次将该操作传递给搜索字符串和文件名。

I am unable to successfully execute this as a shell command using something like: 我无法使用以下命令成功将其作为shell命令执行:

>>>b = subprocess.call(['pcregrep','-M','-n','INSTANCE '+arg, '\)((?!^\).*\n)+ '+file ])
pcregrep: Failed to open \)((?!^\).*
)+ /home/sanjay/thesis/code/sdf/c7552_syn_buf0.sdf: No such file or directory

When I actually put in the argument (U2 in this case) name and the file name, I am able to get the desired output. 当我实际输入参数(在这种情况下为U2)名称和文件名时,便能够获得所需的输出。

EDIT If pcregrep is not friendly enough, here is the awk command: 编辑如果pcregrep不够友好,这是awk命令:

awk '/INSTANCE U2/,/^)\n?/' sdf/c1_syn_buf2.sdf

Returns the same. 返回相同。

Can someone please help me with this? 有人可以帮我吗?

Just looking at your original command line, and formatting the call to one arg per line, should it not be this? 仅查看原始命令行,并将调用的格式设置为每行一个arg,不是吗?

b = subprocess.call(
['pcregrep',
    '-M',
    '-n',
    'INSTANCE {}\)((?!^\)).*\n)+ '.format(arg),
    file ])

I am not so sure about the parenthesis and the backslashes. 我对括号和反斜杠不太确定。 Those are always a bit tricky in regexes. 这些在正则表达式中总是有些棘手。 You might have to fiddle with them a bit to get exactly what you want (look in the python documentation for the r'' regex string type) 您可能需要摆弄一些它们才能确切地得到想要的东西(在python文档中查找r''regex字符串类型)

Looks like I need to use format specifiers %s 看起来我需要使用格式说明符%s

It works when I use: 它在我使用时有效:

b = subprocess.check_output("pcregrep -M -n 'INSTANCE '%s'((?!^\)).*\n)+' {} ".format(file) %arg,shell=True)

With this, I get the exact match into the variable b 这样,我就可以将变量b完全匹配

I am passing the argument using %s and the file name using the {} .format method 我使用%s传递参数,并使用{} .format方法传递文件名

To run the shell command: 要运行shell命令:

$ pcregrep -M -n 'INSTANCE U2((?!^\)).*\n)+' sdf/c1_syn_buf2.sdf

in Python: 在Python中:

from subprocess import check_output as qx

output = qx(['pcregrep', '-M', '-n', r'INSTANCE {}((?!^\)).*\n)+'.format(arg),
             path_to_sdf])
  • use r'' literal or double all backslashes 使用r''文字或将所有反斜杠加倍
  • pass each shell argument as a separate list item 将每个shell参数作为单独的列表项传递

Also, you don't need pcregrep , you could search the file in Python: 另外,您不需要pcregrep ,可以在Python中搜索文件:

import re
from mmap import ACCESS_READ, mmap

with open(path_to_sdf) as f, mmap(f.fileno(), 0, access=ACCESS_READ) as s:
    # arg = re.escape(arg) # call it if you want to match arg verbatim
    output = re.findall(r'INSTANCE {}((?!^\)).*\n)+'.format(arg).encode(), s,
                        flags=re.DOTALL | re.MULTILINE)

mmap is used to accommodate files that do not fit in memory. mmap用于容纳不适合内存的文件。 It also might run faster on Windows. 它也可能在Windows上运行得更快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM