简体   繁体   中英

How do i find multiple occurences of this specific string and split them into a list?

I'm trying to find a specific piece of string inside a bigger whole of a string.

Here's the string, and the bold words are the ones that i want to extract using the re.findall function inside the re library of python.

text|p1_1_SNtestfilefri01| ANTENNA SYSTEM |@|text|p1_2_SNtestfilefri01| ALCATEL-LUCENT |@|text|p1_3_SNtestfilefri01| MW ANTENNA |@|text|p1_4_SNtestfilefri01| DIA 0.6 M 13 GHZ SINGLE POLARIZED |@|text|p1_5_SNtestfilefri01| L1AF10018AAAA |@|text|p1_6_SNtestfilefri01| SNtestfilefri01

Here's my code :

open_file = open(filepath, mode='r')
doc = open_file.read()
datas = re.findall('\|(.*)\|\@\|', doc)
print(datas)

And here's the output :

['p1_1_SNtestfilefri01|ANTENNA SYSTEM|@|text|p1_2_SNtestfilefri01|ALCATEL-LUCENT|@|text|p1_3_SNtestfilefri01|MW ANTENNA|@|text|p1_4_SNtestfilefri01|DIA 0.6 M 13 GHZ SINGLE POLARIZED|@|text|p1_5_SNtestfilefri01|L1AF10018AAAA']

What's the correct pattern so that i could achive something like this ? :

['ANTENNA SYSTEM','ALCATEL-LUCENT','MW ANTENNA','DIA 0.6 M 13 GHZ SINGLE POLARIZED','L1AF10018AAAA', 'SNtestfilefri01']

Also the string i mentioned above doesn't contain any newline (everything is in a single line)

re.findall('[^|]+(?=\|\@\|)', doc)

Explanation:

  • [^|]+ finds chunks of text not containing the separator
  • (?=...) is a "lookahead assertion" (match the text but do not include in result)

This is a dirty solution, but works was on top of my head:

import re

s = "text|p1_1_SNtestfilefri01|ANTENNA SYSTEM|@|text|p1_2_SNtestfilefri01|ALCATEL-LUCENT|@|text|p1_3_SNtestfilefri01|MW ANTENNA|@|text|p1_4_SNtestfilefri01|DIA 0.6 M 13 GHZ SINGLE POLARIZED|@|text|p1_5_SNtestfilefri01|L1AF10018AAAA|@|"

s = s.split('@')
match_list = []

for data in s:
    data += "@|"
    m = re.search('\|(.*)\|(.*)\|\@\|', data)
    if m:
        match_list.append(m.group(2))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM