I have lines in text file as the following :
0044xx aaa, bbb <+> 01/01/0017:53 <&> { 3.01}{00001 }{xxx yyy DIFF}{(4.0-10.5)}{7.2}
and so on
I am trying to extract the values like :
AAA is 0044xx aaa, bbb
BBB is 01/01/0017:53
CCC is 3.01
DDD is 00001
EEE is xxx yyy
FFF is (4.0-10.5)
HHH is 7.2
I am not being able to extract values from CCC to HHH which are enclosed in curly braces.
My script is like:
import sys
import re
import csv
def separateCodes(code):
values = re.compile('.*?\{(.*?)\}.*?')
field=values.findall(code)
for i in range(len(field)):
print field[i]
print"-------------------------"
def handleError(self, record):
raise
with open('/xxx.TXT') as ABCfp:
PP=ABCfp.read()
PPwithNOrn=PP.replace('*\r','').replace('\n', '')
PPByMsg=PPwithNOrn.split('<~>')
print len(PPByMsg)
for i in range(len(PPByMsg)):
AAA=""
BBB=""
CCC=""
DDD=""
EEE=""
FFF=""
GGG=""
HHH=""
print i,"=>",PPByMsg[i]
if PPByMsg[i].find("<L>")!=-1:
print "-----------------------"
# AAA found
AAA=PPByMsg[i].split('<L> <+>')[0]
# BBB found
BBB=PPByMsg[i].split('<L> <+>')[1].split('<&>')[0]
# REST Found
rest=separateCodes(PPByMsg[i].split('<L> <+>')[1].split('<&>')[1])
As I am a newbie to python could not proceed forward. Please suggest a way to extract these values.
How about this instead:
a,b,c = re.split('<[+&]>', i)
bits = re.split('{(.*?)}', c)[1:-1]
bits
will have the tokens of the last part of your string:
>>> bits
[' 3.01', '', '00001 ', '', 'xxx yyy DIFF', '', '(4.0-10.5)', '', '7.2']
>>> a
'0044xx aaa, bbb '
>>> b
' 01/01/0017:53 '
You can do the entire operation with a single regular expression:
>>> t = '0044xx aaa, bbb <+> 01/01/0017:53 <&> { 3.01}{00001 }{xxx yyy DIFF}{(4.0-10.5)}{7.2}'
>>> re.search(r'(.*?)\s<\+>\s(.*?)\s<&>\s{(.*?)\}\{(.*?)\}\{(.*?) DIFF\}\{(.*?)\}\{(.*?)\}', t).groups()
('0044xx aaa, bbb', '01/01/0017:53', ' 3.01', '00001 ', 'xxx yyy', '(4.0-10.5)', '7.2')
You can either extend the regex using (?P<name>.*?)
instead of (.*?)
to give named results:
>>> re.search(r'(?P<a>.*?)\s<\+>\s(?P<b>.*?)\s<&>\s{(?P<c>.*?)\}\{(?P<d>.*?)\}\{(?P<e>.*?) DIFF\}\{(?P<f>.*?)\}\{(?P<g>.*?)\}', t).groupdict()
{'a': '0044xx aaa, bbb', 'c': ' 3.01', 'b': '01/01/0017:53', 'e': 'xxx yyy', 'd': '00001 ', 'g': '7.2', 'f': '(4.0-10.5)'}
Or, use zip
to or tuple assignment, eg:
>>> results = re.search(...).groups()
>>> resultdict = zip('abcdefg', results)
>>> a, b, c, d, e, f, g = results
I have accomplished my requirement as the following :
rest=separateCodes(PatientETLByMsg[i].split('<L> <+>')[1].split('<&>')[1])
CCC=PPByMsg[i].split('{')[1].split('}')[0]
DDD=PPByMsg[i].split('}{')[1]
EEE=PPByMsg[i].split('}{')[2]
FFF=PPByMsg[i].split('}{')[3]
GGG=PPByMsg[i].split('}{')[4]
HHH=PPByMsg[i].split('}{')[5]
KKK=PPByMsg[i].split('}{')[6].split('}')[0]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.