简体   繁体   中英

output multiple files based on column value python pandas

i have a sample pandas data frame:

import pandas as pd

df = {'ID': [73, 68,1,94,42,22, 28,70,47, 46,17, 19, 56, 33 ],
  'CloneID': [1, 1, 1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4 ],
  'VGene': ['64D', '64D', '64D', 61, 61, 61, 311, 311, 311, 311, 311,  311, 311, 311]}
df = pd.DataFrame(df)

it looks like this:

df
Out[7]: 
    CloneID  ID VGene
0         1  73   64D
1         1  68   64D
2         1   1   64D
3         1  94    61
4         1  42    61
5         2  22    61
6         2  28   311
7         3  70   311
8         3  47   311
9         3  46   311
10        4  17   311
11        4  19   311
12        4  56   311
13        4  33   311

i want to write a simple script to output each cloneID to a different output file. so in this case there would be 4 different files. the first file would be named 'CloneID1.txt' and it would look like this:

CloneID  ID   VGene
     1   73   64D
     1   68   64D
     1   1    64D
     1   94   61
     1   42   61

second file would be named 'CloneID2.txt':

CloneID  ID  VGene
     2   22   61
     2   28   311

third file would be named 'CloneID3.txt':

CloneID  ID  VGene
     3   70   311
     3   47   311
     3   46   311

and last file would be 'CloneID4.txt':

CloneID  ID VGene 
    4    17   311
    4    19   311
    4    56   311
    4    33   311

the code i found online was:

import pandas as pd
data = pd.read_excel('data.xlsx')

for group_name, data in data.groupby('CloneID'):
    with open('results.csv', 'a') as f:
        data.to_csv(f)

but it outputs everything to one file instead of multiple files.

You can do something like the following:

In [19]:
gp = df.groupby('CloneID')
for g in gp.groups:
    print('CloneID' + str(g) + '.txt')
    print(gp.get_group(g).to_csv())

CloneID1.txt
,CloneID,ID,VGene
0,1,73,64D
1,1,68,64D
2,1,1,64D
3,1,94,61
4,1,42,61

CloneID2.txt
,CloneID,ID,VGene
5,2,22,61
6,2,28,311

CloneID3.txt
,CloneID,ID,VGene
7,3,70,311
8,3,47,311
9,3,46,311

CloneID4.txt
,CloneID,ID,VGene
10,4,17,311
11,4,19,311
12,4,56,311
13,4,33,311

So here we iterate over the groups in for g in gp.groups: and we use this to create the result file path name and call to_csv on the group so the following should work for you:

gp = df.groupby('CloneID')
for g in gp.groups:
    path = 'CloneID' + str(g) + '.txt'
    gp.get_group(g).to_csv(path)

Actually the following would be even simpler:

gp = df.groupby('CloneID')
gp.apply(lambda x: x.to_csv('CloneID' + str(x.name) + '.txt'))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM