简体   繁体   中英

Extract values enclosed in curly braces from text file using python

I have lines in text file as the following :

0044xx aaa, bbb <+> 01/01/0017:53 <&> { 3.01}{00001 }{xxx yyy DIFF}{(4.0-10.5)}{7.2}

and so on

I am trying to extract the values like :

AAA is 0044xx aaa, bbb 

BBB is 01/01/0017:53

CCC is 3.01

DDD is 00001

EEE is xxx yyy

FFF is (4.0-10.5)

HHH is 7.2

I am not being able to extract values from CCC to HHH which are enclosed in curly braces.

My script is like:

import sys

import re

import csv

def separateCodes(code):
    values = re.compile('.*?\{(.*?)\}.*?')
    field=values.findall(code)    

    for i in range(len(field)):
        print field[i]
    print"-------------------------"        

def handleError(self, record):
    raise    
with open('/xxx.TXT') as ABCfp:    
    PP=ABCfp.read()

PPwithNOrn=PP.replace('*\r','').replace('\n', '')
PPByMsg=PPwithNOrn.split('<~>')
print len(PPByMsg)

for i in range(len(PPByMsg)):

    AAA=""
    BBB=""
    CCC=""
    DDD=""
    EEE=""
    FFF=""
    GGG=""
    HHH=""

    print i,"=>",PPByMsg[i]
    if PPByMsg[i].find("<L>")!=-1:
        print "-----------------------"
        # AAA found
        AAA=PPByMsg[i].split('<L>  <+>')[0]
    # BBB found
    BBB=PPByMsg[i].split('<L>  <+>')[1].split('<&>')[0]
        # REST Found
    rest=separateCodes(PPByMsg[i].split('<L>  <+>')[1].split('<&>')[1])

As I am a newbie to python could not proceed forward. Please suggest a way to extract these values.

How about this instead:

a,b,c = re.split('<[+&]>', i)
bits = re.split('{(.*?)}', c)[1:-1]

bits will have the tokens of the last part of your string:

>>> bits
[' 3.01', '', '00001 ', '', 'xxx yyy DIFF', '', '(4.0-10.5)', '', '7.2']
>>> a
'0044xx aaa, bbb '
>>> b
' 01/01/0017:53 '

You can do the entire operation with a single regular expression:

>>> t = '0044xx aaa, bbb <+> 01/01/0017:53 <&> { 3.01}{00001 }{xxx yyy DIFF}{(4.0-10.5)}{7.2}'
>>> re.search(r'(.*?)\s<\+>\s(.*?)\s<&>\s{(.*?)\}\{(.*?)\}\{(.*?) DIFF\}\{(.*?)\}\{(.*?)\}', t).groups()
('0044xx aaa, bbb', '01/01/0017:53', ' 3.01', '00001 ', 'xxx yyy', '(4.0-10.5)', '7.2')

You can either extend the regex using (?P<name>.*?) instead of (.*?) to give named results:

>>> re.search(r'(?P<a>.*?)\s<\+>\s(?P<b>.*?)\s<&>\s{(?P<c>.*?)\}\{(?P<d>.*?)\}\{(?P<e>.*?) DIFF\}\{(?P<f>.*?)\}\{(?P<g>.*?)\}', t).groupdict()
{'a': '0044xx aaa, bbb', 'c': ' 3.01', 'b': '01/01/0017:53', 'e': 'xxx yyy', 'd': '00001 ', 'g': '7.2', 'f': '(4.0-10.5)'}

Or, use zip to or tuple assignment, eg:

>>> results = re.search(...).groups()
>>> resultdict = zip('abcdefg', results)
>>> a, b, c, d, e, f, g = results

I have accomplished my requirement as the following :

rest=separateCodes(PatientETLByMsg[i].split('<L>  <+>')[1].split('<&>')[1])

CCC=PPByMsg[i].split('{')[1].split('}')[0]
DDD=PPByMsg[i].split('}{')[1]
EEE=PPByMsg[i].split('}{')[2]
FFF=PPByMsg[i].split('}{')[3]
GGG=PPByMsg[i].split('}{')[4]
HHH=PPByMsg[i].split('}{')[5]
KKK=PPByMsg[i].split('}{')[6].split('}')[0]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM