I've below file in txt format
<START>
<SUBSTART
COLA=123;
COLB=123;
COLc=ABC;
COLc=BCD;
COLD=DEF;
<SUBEND
<SUBSTART
COLA=456;
COLB=456;
COLc=def;
COLc=def;
COLD=xyz;
<SUBEND
<SUBSTART
COLA=789;
COLB=789;
COLc=ghi;
COLc=ghi;
COLD=xyz;
<SUBEND>
<END>
Expected output,
COLA,COLB,COLc,COLc,COLD
123,123,ABC,BCD,DEF
456,456,def,def,xyz
789,789,ghi,ghi,xyz
how could I implement it in this python?
I've tried using dictionary, since it has repitative keys.that is not working.
You need to write a small parser for your custom format.
Here is a very naive a simple example:
out = []
add = False
for line in text.split('\n'): # here you could read from file instead
if line.startswith(' '):
if not add:
out.append({})
add = True
k,v = line.strip(' ;').split('=')
if k in out[-1]:
k += '_'
out[-1][k] = v
else:
add = False
df = pd.DataFrame(out)
df.to_csv('/tmp/output.csv', index=False)
csv output:
COLA,COLB,COLc,COLc_,COLD
123,123,ABC,BCD,DEF
456,456,def,def,xyz
789,789,ghi,ghi,xyz
input:
text = '''<SUBSTART
COLA=123;
COLB=123;
COLc=ABC;
COLc=BCD;
COLD=DEF;
<SUBEND
<SUBSTART
COLA=456;
COLB=456;
COLc=def;
COLc=def;
COLD=xyz;
<SUBSTART
COLA=789;
COLB=789;
COLc=ghi;
COLc=ghi;
COLD=xyz;
<SUBEND>
<END>'''
import json
def main():
datas = []
with open('example.txt', 'r') as f:
sub_datas = []
for line in f.readlines():
if not line.startswith('<'):
items = line.strip()[:-1].split("=")
sub_datas.append({
items[0]: items[1]
})
elif line.startswith('<SUBEND'):
datas.append(sub_datas)
sub_datas = []
print(json.dumps(datas, indent=4))
if __name__ == '__main__':
main()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.