简体   繁体   中英

How to search string in a line and extract data between two characters in python?

file contents:

module traffic(
    green_main, yellow_main, red_main, green_first, yellow_first, 
    red_first, clk, rst, waiting_main, waiting_first
);

I need to search the string 'module' and I need to extract the contents between (.......); the brackets.

Here is the code I tried out, I am not able to get the result

fp = open(file_name)
contents = fp.read()
unique_word_a = '('
unique_word_b = ');'
s = contents

for line in contents:
    if 'module' in line:
        your_string=s[s.find(unique_word_a)+len(unique_word_a):s.find(unique_word_b)].strip()
        print(your_string)

The problem with your code is here:

for line in contents:
    if 'module' in line:

Here, contents is a single string holding the entire content of the file, not a list of strings (lines) or a file handle that can be looped line-by-line. Thus, your line is in fact not a line, but a single character in that string, which obviously can never contain the substring "module" .

Since you never actually use the line within the loop, you could just remove both the loop and the condition and your code will work just fine. (And if you changed your code to actually loop lines, and find within those lines, it would not work since the ( and ) are not on the same line.)


Alternatively, you can use a regular expression:

>>> content = """module traffic(green_main, yellow_main, red_main, green_first, yellow_first, 
...                red_first, clk, rst, waiting_main, waiting_first);"""
...
>>> re.search("module \w+\((.*?)\);", content, re.DOTALL).group(1)
'green_main, yellow_main, red_main, green_first, yellow_first, \n               red_first, clk, rst, waiting_main, waiting_first'

Here, module \\w+\\((.*?)\\); means

  • the word module followed by a space and some word-type \\w characters
  • an literal opening (
  • a capturing group (...) with anything . , including linebreaks ( re.DOTALL ), non-greedy *?
  • an literal closing ) and ;

and group(1) gets you what's found in between the (non-escaped) pair of (...)

And if you want those as a list:

>>> list(map(str.strip, _.split(",")))
['green_main', 'yellow_main', 'red_main', 'green_first', 'yellow_first', 'red_first', 'clk', 'rst', 'waiting_main', 'waiting_first']

if you want to extract content between "(" ")" you can do:(but first take care how you handle the content):

for line in content.split('\n'):
    if 'module' in line:
        line_content = line[line.find('(') + 1: line.find(')')]

if your content is not only in one line :

import math 
def find_all(your_string, search_string, max_index=math.inf, offset=0,):
    index = your_string.find(search_string, offset)

    while index != -1 and index < max_index:
        yield index
        index = your_string.find(search_string, index + 1)

s = content.replace('\n', '')

for offset in find_all(s, 'module'):
    max_index = s.find('module', offset=offset + len('module'))
    if max_index == -1:
        max_index = math.inf
    print([s[start + 1: stop] for start, stop in zip(find_all(s, '(',max_index, offset), find_all(s, ')', max_index, offset))])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM