简体   繁体   中英

Python Convert nested key value file to csv file

I've below file in txt format

<START>
<SUBSTART
    COLA=123;
    COLB=123;
    COLc=ABC;
    COLc=BCD;
    COLD=DEF;
<SUBEND
<SUBSTART
    COLA=456;
    COLB=456;
    COLc=def;
    COLc=def;
    COLD=xyz;
<SUBEND
<SUBSTART
    COLA=789;
    COLB=789;
    COLc=ghi;
    COLc=ghi;
    COLD=xyz;
<SUBEND>
<END>

Expected output,

COLA,COLB,COLc,COLc,COLD
123,123,ABC,BCD,DEF
456,456,def,def,xyz
789,789,ghi,ghi,xyz

how could I implement it in this python?

I've tried using dictionary, since it has repitative keys.that is not working.

You need to write a small parser for your custom format.

Here is a very naive a simple example:

out = []
add = False
for line in text.split('\n'):  # here you could read from file instead
    if line.startswith('  '):
        if not add:
            out.append({})
            add = True
        k,v = line.strip(' ;').split('=')
        if k in out[-1]:
            k += '_'
        out[-1][k] = v
    else:
        add = False
        
df = pd.DataFrame(out)
df.to_csv('/tmp/output.csv', index=False)

csv output:

COLA,COLB,COLc,COLc_,COLD
123,123,ABC,BCD,DEF
456,456,def,def,xyz
789,789,ghi,ghi,xyz

input:

text = '''<SUBSTART
    COLA=123;
    COLB=123;
    COLc=ABC;
    COLc=BCD;
    COLD=DEF;
<SUBEND
<SUBSTART
    COLA=456;
    COLB=456;
    COLc=def;
    COLc=def;
    COLD=xyz;
<SUBSTART
    COLA=789;
    COLB=789;
    COLc=ghi;
    COLc=ghi;
    COLD=xyz;
<SUBEND>
<END>'''
  • if not startswith("<"), save
  • if startswith('<SUBEND'), add new newline
import json


def main():
    datas = []
    with open('example.txt', 'r') as f:
        sub_datas = []
        for line in f.readlines():
            if not line.startswith('<'):
                items = line.strip()[:-1].split("=")
                sub_datas.append({
                    items[0]: items[1]
                })
            elif line.startswith('<SUBEND'):
                datas.append(sub_datas)
                sub_datas = []

    print(json.dumps(datas, indent=4))


if __name__ == '__main__':
    main()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM