简体   繁体   English

根据列值python pandas输出多个文件

[英]output multiple files based on column value python pandas

i have a sample pandas data frame: 我有一个示例pandas数据框:

import pandas as pd

df = {'ID': [73, 68,1,94,42,22, 28,70,47, 46,17, 19, 56, 33 ],
  'CloneID': [1, 1, 1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4 ],
  'VGene': ['64D', '64D', '64D', 61, 61, 61, 311, 311, 311, 311, 311,  311, 311, 311]}
df = pd.DataFrame(df)

it looks like this: 它看起来像这样:

df
Out[7]: 
    CloneID  ID VGene
0         1  73   64D
1         1  68   64D
2         1   1   64D
3         1  94    61
4         1  42    61
5         2  22    61
6         2  28   311
7         3  70   311
8         3  47   311
9         3  46   311
10        4  17   311
11        4  19   311
12        4  56   311
13        4  33   311

i want to write a simple script to output each cloneID to a different output file. 我想写一个简单的脚本将每个cloneID输出到不同的输出文件。 so in this case there would be 4 different files. 所以在这种情况下会有4个不同的文件。 the first file would be named 'CloneID1.txt' and it would look like this: 第一个文件名为'CloneID1.txt',它看起来像这样:

CloneID  ID   VGene
     1   73   64D
     1   68   64D
     1   1    64D
     1   94   61
     1   42   61

second file would be named 'CloneID2.txt': 第二个文件名为'CloneID2.txt':

CloneID  ID  VGene
     2   22   61
     2   28   311

third file would be named 'CloneID3.txt': 第三个文件名为'CloneID3.txt':

CloneID  ID  VGene
     3   70   311
     3   47   311
     3   46   311

and last file would be 'CloneID4.txt': 最后一个文件是'CloneID4.txt':

CloneID  ID VGene 
    4    17   311
    4    19   311
    4    56   311
    4    33   311

the code i found online was: 我在网上找到的代码是:

import pandas as pd
data = pd.read_excel('data.xlsx')

for group_name, data in data.groupby('CloneID'):
    with open('results.csv', 'a') as f:
        data.to_csv(f)

but it outputs everything to one file instead of multiple files. 但它将所有内容输出到一个文件而不是多个文件。

You can do something like the following: 您可以执行以下操作:

In [19]:
gp = df.groupby('CloneID')
for g in gp.groups:
    print('CloneID' + str(g) + '.txt')
    print(gp.get_group(g).to_csv())

CloneID1.txt
,CloneID,ID,VGene
0,1,73,64D
1,1,68,64D
2,1,1,64D
3,1,94,61
4,1,42,61

CloneID2.txt
,CloneID,ID,VGene
5,2,22,61
6,2,28,311

CloneID3.txt
,CloneID,ID,VGene
7,3,70,311
8,3,47,311
9,3,46,311

CloneID4.txt
,CloneID,ID,VGene
10,4,17,311
11,4,19,311
12,4,56,311
13,4,33,311

So here we iterate over the groups in for g in gp.groups: and we use this to create the result file path name and call to_csv on the group so the following should work for you: 所以这里我们for g in gp.groups:迭代for g in gp.groups:我们使用它来创建结果文件路径名并在组上调用to_csv ,以便以下内容适合您:

gp = df.groupby('CloneID')
for g in gp.groups:
    path = 'CloneID' + str(g) + '.txt'
    gp.get_group(g).to_csv(path)

Actually the following would be even simpler: 实际上以下更简单:

gp = df.groupby('CloneID')
gp.apply(lambda x: x.to_csv('CloneID' + str(x.name) + '.txt'))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM