简体   繁体   中英

output file with names from groupby results pandas python

i have a sample dataset:

import pandas as pd


df = {'READID': [1,1,1  ,1,1    ,5,5    ,5,5],
  'VG': ['LV5-F*01','LV5-F*01'  ,'LV5-F*01','LV5-F*01','LV5-F*01','LV5-A*01','LV5-A*01','LV5-A*01','LV5-A*01'],
  'Pro': [1,1,1,1,1,2,2,2,2]}

df = pd.DataFrame(df)

it looks like :

df
Out[23]: 
     Pro  READID     VG
0    1       1   LV5-F*01
1    1       1   LV5-F*01
2    1       1   LV5-F*01
3    1       1   LV5-F*01
4    1       1   LV5-F*01
5    2       5   LV5-A*01
6    2       5   LV5-A*01
7    2       5   LV5-A*01
8    2       5   LV5-A*01

This is a sample dataset, the actual dataset contains many more columns and many many more rows with different combinations for the groupby, i want to groupby the 3 columns and output individual separate files with the VG as part of its name:

desired output:

'LV5-F*01.txt':

     Pro  READID     VG
0    1       1   LV5-F*01
1    1       1   LV5-F*01
2    1       1   LV5-F*01
3    1       1   LV5-F*01
4    1       1   LV5-F*01

'LV5-A*01.txt':

    Pro  READID     VG
5    2       5   LV5-A*01
6    2       5   LV5-A*01
7    2       5   LV5-A*01
8    2       5   LV5-A*01

My attempt:

(df.groupby(['READID','VG','Pro'])
.apply(lambda gp: gp.to_csv('{}.txt'.format(gp.VG.name), sep='\t', index=False))
 )

however, the

  '{}.txt'.format(gp.VG.name) 

part only produced a file named 'VG.txt' containing only 1 line which is not what i want.

You don't need groupby, you can just select the rows you need and convert them to text file.

import pandas as pd
df = {'READID': [1,1,1  ,1,1    ,5,5    ,5,5],
  'VG': ['LV5-F*01','LV5-F*01'  ,'LV5-F*01','LV5-F*01','LV5-F*01','LV5-A*01','LV5-A*01','LV5-A*01','LV5-A*01'],
  'Pro': [1,1,1,1,1,2,2,2,2]}
df = pd.DataFrame(df)

with open('LV5-F*01.txt', 'w') as fil:
    fil.write(df[df['VG'] == 'LV5-F*01'].to_string())

with open('LV5-A*01.txt', 'w') as fil:
    fil.write(df[df['VG'] == 'LV5-A*01'].to_string())
g = df.groupby(['READID','VG','Pro'])
for group in g:
    group[1].to_csv('{}.txt'.format(group[0][1]), sep='\t', index=False)

You might want to strip * character if it causes problems.

Also note that you group on three keys but using only one key as the file name. It may overwrite other files with the same key.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM