I'm trying to find a specific piece of string inside a bigger whole of a string.
Here's the string, and the bold words are the ones that i want to extract using the re.findall function inside the re library of python.
text|p1_1_SNtestfilefri01| ANTENNA SYSTEM |@|text|p1_2_SNtestfilefri01| ALCATEL-LUCENT |@|text|p1_3_SNtestfilefri01| MW ANTENNA |@|text|p1_4_SNtestfilefri01| DIA 0.6 M 13 GHZ SINGLE POLARIZED |@|text|p1_5_SNtestfilefri01| L1AF10018AAAA |@|text|p1_6_SNtestfilefri01| SNtestfilefri01
Here's my code :
open_file = open(filepath, mode='r')
doc = open_file.read()
datas = re.findall('\|(.*)\|\@\|', doc)
print(datas)
And here's the output :
['p1_1_SNtestfilefri01|ANTENNA SYSTEM|@|text|p1_2_SNtestfilefri01|ALCATEL-LUCENT|@|text|p1_3_SNtestfilefri01|MW ANTENNA|@|text|p1_4_SNtestfilefri01|DIA 0.6 M 13 GHZ SINGLE POLARIZED|@|text|p1_5_SNtestfilefri01|L1AF10018AAAA']
What's the correct pattern so that i could achive something like this ? :
['ANTENNA SYSTEM','ALCATEL-LUCENT','MW ANTENNA','DIA 0.6 M 13 GHZ SINGLE POLARIZED','L1AF10018AAAA', 'SNtestfilefri01']
Also the string i mentioned above doesn't contain any newline (everything is in a single line)
re.findall('[^|]+(?=\|\@\|)', doc)
Explanation:
[^|]+
finds chunks of text not containing the separator (?=...)
is a "lookahead assertion" (match the text but do not include in result) This is a dirty solution, but works was on top of my head:
import re
s = "text|p1_1_SNtestfilefri01|ANTENNA SYSTEM|@|text|p1_2_SNtestfilefri01|ALCATEL-LUCENT|@|text|p1_3_SNtestfilefri01|MW ANTENNA|@|text|p1_4_SNtestfilefri01|DIA 0.6 M 13 GHZ SINGLE POLARIZED|@|text|p1_5_SNtestfilefri01|L1AF10018AAAA|@|"
s = s.split('@')
match_list = []
for data in s:
data += "@|"
m = re.search('\|(.*)\|(.*)\|\@\|', data)
if m:
match_list.append(m.group(2))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.