[英]output groupby to csv file pandas
I have a sample dataset: 我有一个样本数据集:
import pandas as pd
df = {'ID': ['H1','H2','H3','H4','H5','H6'],
'AA1': ['C','B','B','X','G','G'],
'AA2': ['W','K','K','A','B','B'],
'name':['n1','n2','n3','n4','n5','n6']
}
df = pd.DataFrame(df)
it looks like : 看起来像 :
df
Out[32]:
AA1 AA2 ID name
0 C W H1 n1
1 B K H2 n2
2 B K H3 n3
3 X A H4 n4
4 G B H5 n5
5 G B H6 n6
I want to groupby AA1 and AA2 (unique AA1 and AA2 pair) and it doesn't matter which ID and name values the unique pair picks along with it, and output that to a .csv file, so the output in the .csv file would look like: 我想对AA1和AA2(唯一的AA1和AA2对)进行分组,这与唯一对一起选择的ID和名称值无关紧要,并将其输出到.csv文件,因此在.csv文件中输出看起来像:
AA1 AA2 ID name
C W H1 n1
B K H2 n2
X A H4 n4
G B H5 n5
i tried the code: 我尝试了代码:
df.groupby('AA1','AA2').apply(to_csv('merged.txt', sep = '\t', index=False))
but the to_csv was not recognized, what can i put in the .apply() to just output the groupby results to a csv file? 但是to_csv无法识别,我可以在.apply()中放入什么才能将groupby结果输出到csv文件中?
The problem is that you are trying to apply a function to_csv
which doesn't exist. 问题是您试图将一个函数应用到不存在的
to_csv
。 Anyway, groupby also doesn't have a to_csv method. 无论如何,groupby也没有to_csv方法。
pd.Series
and pd.DataFrame
do. pd.Series
和pd.DataFrame
可以。
What you should really use is drop_duplicates
here and then export the resulting dataframe to csv: 您真正应该使用的是
drop_duplicates
,然后将结果数据帧导出到csv:
df.drop_duplicates(['AA1','AA2']).to_csv('merged.txt')
PS: If you really wanted a groupby solution, there's this one that happens to be 12 times slower than drop_duplicates...: PS:如果您真的想要一个groupby解决方案,那么这个解决方案的速度比drop_duplicates慢12倍...:
df.groupby(['AA1','AA2']).agg(lambda x:x.value_counts().index[0]).to_csv('merged.txt')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.