[英]output multiple files based on column value python pandas
i have a sample pandas data frame: 我有一个示例pandas数据框:
import pandas as pd
df = {'ID': [73, 68,1,94,42,22, 28,70,47, 46,17, 19, 56, 33 ],
'CloneID': [1, 1, 1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4 ],
'VGene': ['64D', '64D', '64D', 61, 61, 61, 311, 311, 311, 311, 311, 311, 311, 311]}
df = pd.DataFrame(df)
it looks like this: 它看起来像这样:
df
Out[7]:
CloneID ID VGene
0 1 73 64D
1 1 68 64D
2 1 1 64D
3 1 94 61
4 1 42 61
5 2 22 61
6 2 28 311
7 3 70 311
8 3 47 311
9 3 46 311
10 4 17 311
11 4 19 311
12 4 56 311
13 4 33 311
i want to write a simple script to output each cloneID to a different output file. 我想写一个简单的脚本将每个cloneID输出到不同的输出文件。 so in this case there would be 4 different files.
所以在这种情况下会有4个不同的文件。 the first file would be named 'CloneID1.txt' and it would look like this:
第一个文件名为'CloneID1.txt',它看起来像这样:
CloneID ID VGene
1 73 64D
1 68 64D
1 1 64D
1 94 61
1 42 61
second file would be named 'CloneID2.txt': 第二个文件名为'CloneID2.txt':
CloneID ID VGene
2 22 61
2 28 311
third file would be named 'CloneID3.txt': 第三个文件名为'CloneID3.txt':
CloneID ID VGene
3 70 311
3 47 311
3 46 311
and last file would be 'CloneID4.txt': 最后一个文件是'CloneID4.txt':
CloneID ID VGene
4 17 311
4 19 311
4 56 311
4 33 311
the code i found online was: 我在网上找到的代码是:
import pandas as pd
data = pd.read_excel('data.xlsx')
for group_name, data in data.groupby('CloneID'):
with open('results.csv', 'a') as f:
data.to_csv(f)
but it outputs everything to one file instead of multiple files. 但它将所有内容输出到一个文件而不是多个文件。
You can do something like the following: 您可以执行以下操作:
In [19]:
gp = df.groupby('CloneID')
for g in gp.groups:
print('CloneID' + str(g) + '.txt')
print(gp.get_group(g).to_csv())
CloneID1.txt
,CloneID,ID,VGene
0,1,73,64D
1,1,68,64D
2,1,1,64D
3,1,94,61
4,1,42,61
CloneID2.txt
,CloneID,ID,VGene
5,2,22,61
6,2,28,311
CloneID3.txt
,CloneID,ID,VGene
7,3,70,311
8,3,47,311
9,3,46,311
CloneID4.txt
,CloneID,ID,VGene
10,4,17,311
11,4,19,311
12,4,56,311
13,4,33,311
So here we iterate over the groups in for g in gp.groups:
and we use this to create the result file path name and call to_csv
on the group so the following should work for you: 所以这里我们
for g in gp.groups:
迭代for g in gp.groups:
我们使用它来创建结果文件路径名并在组上调用to_csv
,以便以下内容适合您:
gp = df.groupby('CloneID')
for g in gp.groups:
path = 'CloneID' + str(g) + '.txt'
gp.get_group(g).to_csv(path)
Actually the following would be even simpler: 实际上以下更简单:
gp = df.groupby('CloneID')
gp.apply(lambda x: x.to_csv('CloneID' + str(x.name) + '.txt'))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.