简体   繁体   中英

Python multi-line pattern matching

I am trying to match a multiline pattern using a shell command through python.

I am able to match using the shell commands but I am not able to pass this command through the Python subprocess.call or the os.system modules.

My file looks something like this:

(CELL
  (CELLTYPE "NAND_2X1")
  (INSTANCE U2)
  (DELAY
    (ABSOLUTE
    (IOPATH A1 ZN (0.02700::0.02700) (0.01012::0.01012))
    (IOPATH A2 ZN (0.02944::0.02944) (0.00930::0.00930))
    )
  )
)

No, I am trying to extract this:

  (INSTANCE U2)
  (DELAY
    (ABSOLUTE
    (IOPATH A1 ZN (0.02700::0.02700) (0.01012::0.01012))
    (IOPATH A2 ZN (0.02944::0.02944) (0.00930::0.00930))
    )
  )

using this regex:

pcregrep -M -n 'INSTANCE U2((?!^\)).*\n)+' sdf/c1_syn_buf2.sdf

wherein U2 is the search string and sdf/c1_syn_buf2.sdf is the file name

In Python, I have defined a function to which I will pass the search string and the file name as I have to do this operation multiple times.

I am unable to successfully execute this as a shell command using something like:

>>>b = subprocess.call(['pcregrep','-M','-n','INSTANCE '+arg, '\)((?!^\).*\n)+ '+file ])
pcregrep: Failed to open \)((?!^\).*
)+ /home/sanjay/thesis/code/sdf/c7552_syn_buf0.sdf: No such file or directory

When I actually put in the argument (U2 in this case) name and the file name, I am able to get the desired output.

EDIT If pcregrep is not friendly enough, here is the awk command:

awk '/INSTANCE U2/,/^)\n?/' sdf/c1_syn_buf2.sdf

Returns the same.

Can someone please help me with this?

Just looking at your original command line, and formatting the call to one arg per line, should it not be this?

b = subprocess.call(
['pcregrep',
    '-M',
    '-n',
    'INSTANCE {}\)((?!^\)).*\n)+ '.format(arg),
    file ])

I am not so sure about the parenthesis and the backslashes. Those are always a bit tricky in regexes. You might have to fiddle with them a bit to get exactly what you want (look in the python documentation for the r'' regex string type)

Looks like I need to use format specifiers %s

It works when I use:

b = subprocess.check_output("pcregrep -M -n 'INSTANCE '%s'((?!^\)).*\n)+' {} ".format(file) %arg,shell=True)

With this, I get the exact match into the variable b

I am passing the argument using %s and the file name using the {} .format method

To run the shell command:

$ pcregrep -M -n 'INSTANCE U2((?!^\)).*\n)+' sdf/c1_syn_buf2.sdf

in Python:

from subprocess import check_output as qx

output = qx(['pcregrep', '-M', '-n', r'INSTANCE {}((?!^\)).*\n)+'.format(arg),
             path_to_sdf])
  • use r'' literal or double all backslashes
  • pass each shell argument as a separate list item

Also, you don't need pcregrep , you could search the file in Python:

import re
from mmap import ACCESS_READ, mmap

with open(path_to_sdf) as f, mmap(f.fileno(), 0, access=ACCESS_READ) as s:
    # arg = re.escape(arg) # call it if you want to match arg verbatim
    output = re.findall(r'INSTANCE {}((?!^\)).*\n)+'.format(arg).encode(), s,
                        flags=re.DOTALL | re.MULTILINE)

mmap is used to accommodate files that do not fit in memory. It also might run faster on Windows.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM