简体   繁体   中英

Regex to find substring starting with [ ]

The below is the sample substring present in a much larger string (detaildesc_final) that I have obtained. I need to use a regex search across the string so that I can retrieve all the lines that begin with " [] " (The two square brackets I mean) from the [Data] Section. All lines should be retrieved in the [Data] section until the [Logs] line is encountered.

[Data]

[] some text

[] some_other_text

[] some_other_text

[] some_other_text

[] some_other_text

[] some_other_text

[] some_other_text

[] some_other_text

[] some_other_text

[] some_other_text

[] some_other_text

[] some_other_text

[Logs]

I'm using Python to work the code and I've used the following command (which clearly is incorrect).

re.findall(r'\b\\[\\]\w*', detaildesc_final)

I need the result to be in the following format:

some text

some_other_text

some_other_text

some_other_text

some_other_text

some_other_text

some_other_text

some_other_text

some_other_text

some_other_text

some_other_text

some_other_text

I have already looked a lot online and I could figure out to find any line starting with a single double character instead of two ( [] in this case). Any help would be greatly appreciated. Thank you.

Don't over-complicate things.

for line in detaildesc_final.split('\n'):
    if line.startswith('[]'):
        do_something()
import re

str = """
[Data]

[] some text

[] some_other_text

[] some_other_text

[] some_other_text

[] some_other_text

[] some_other_text

[] some_other_text

[] some_other_text

[] some_other_text

[] some_other_text

[] some_other_text

[] some_other_text

[Logs]
"""


print re.sub("([[a-zA-Z ]{0,}][ ]?)", '',str)

output:

some text

some_other_text

some_other_text

some_other_text

some_other_text

some_other_text

some_other_text

some_other_text

some_other_text

some_other_text

some_other_text

some_other_text

You need positive look behind :

import re

pattern=r'(?<=\[\])(.\w.+)'

string_1="""[Data]

[] some text

[] some_other_text

[] some_other_text

[] some_other_text

[] some_other_text

[] some_other_text

[] some_other_text

[] some_other_text

[] some_other_text

[] some_other_text

[] some_other_text

[] some_other_text

[Logs]"""


match=re.finditer(pattern,string_1,re.M)
for item in match:
    print(item.group(1))

output:

 some text
 some_other_text
 some_other_text
 some_other_text
 some_other_text
 some_other_text
 some_other_text
 some_other_text
 some_other_text
 some_other_text
 some_other_text
 some_other_text

Regex explanation :

Positive Lookbehind (?<=\[\])

It tells the regex engine to temporarily step backwards in the string, to check if the text inside the lookbehind can be matched there.

  • \\[ matches the character [ literally (case sensitive)
  • \\] matches the character ] literally (case sensitive)
  • . matches any character (except for line terminators)
  • \\w matches any word character (equal to [a-zA-Z0-9_] )
  • + Quantifier Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
import re
re.findall(r'\[\] (.*)\n\n', detaildesc_final)

Output:

['some text',
 'some_other_text',
 'some_other_text',
 'some_other_text',
 'some_other_text',
 'some_other_text',
 'some_other_text',
 'some_other_text',
 'some_other_text',
 'some_other_text',
 'some_other_text',
 'some_other_text']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM