输出groupby到csv文件熊猫

Question

I have a sample dataset: 我有一个样本数据集：

import pandas as pd
df = {'ID': ['H1','H2','H3','H4','H5','H6'],
      'AA1': ['C','B','B','X','G','G'],
      'AA2': ['W','K','K','A','B','B'],
      'name':['n1','n2','n3','n4','n5','n6']
}

df = pd.DataFrame(df)

it looks like : 看起来像：

df
Out[32]: 
   AA1 AA2  ID name
0   C   W  H1   n1
1   B   K  H2   n2
2   B   K  H3   n3
3   X   A  H4   n4
4   G   B  H5   n5
5   G   B  H6   n6

I want to groupby AA1 and AA2 (unique AA1 and AA2 pair) and it doesn't matter which ID and name values the unique pair picks along with it, and output that to a .csv file, so the output in the .csv file would look like: 我想对AA1和AA2（唯一的AA1和AA2对）进行分组，这与唯一对一起选择的ID和名称值无关紧要，并将其输出到.csv文件，因此在.csv文件中输出看起来像：

 AA1 AA2  ID name
  C   W  H1   n1
  B   K  H2   n2
  X   A  H4   n4
  G   B  H5   n5

i tried the code: 我尝试了代码：

df.groupby('AA1','AA2').apply(to_csv('merged.txt', sep = '\t', index=False))

but the to_csv was not recognized, what can i put in the .apply() to just output the groupby results to a csv file? 但是to_csv无法识别，我可以在.apply（）中放入什么才能将groupby结果输出到csv文件中？

Answer 1

The problem is that you are trying to apply a function to_csv which doesn't exist. 问题是您试图将一个函数应用到不存在的to_csv 。 Anyway, groupby also doesn't have a to_csv method. 无论如何，groupby也没有to_csv方法。 pd.Series and pd.DataFrame do. pd.Series和pd.DataFrame可以。

What you should really use is drop_duplicates here and then export the resulting dataframe to csv: 您真正应该使用的是drop_duplicates ，然后将结果数据帧导出到csv：

df.drop_duplicates(['AA1','AA2']).to_csv('merged.txt')

PS: If you really wanted a groupby solution, there's this one that happens to be 12 times slower than drop_duplicates...: PS：如果您真的想要一个groupby解决方案，那么这个解决方案的速度比drop_duplicates慢12倍...：

df.groupby(['AA1','AA2']).agg(lambda x:x.value_counts().index[0]).to_csv('merged.txt')

Answer 2

you can use groupby with head 你可以用head groupby

df.groupby(['AA1', 'AA2']).head(1)

输出groupby到csv文件熊猫

问题描述

2 个解决方案

解决方案1
2 已采纳 2016-11-30 22:06:38

解决方案2
2 2016-12-01 07:46:47

输出groupby到csv文件熊猫

问题描述

2 个解决方案

解决方案1 2 已采纳 2016-11-30 22:06:38

解决方案2 2 2016-12-01 07:46:47

解决方案1
2 已采纳 2016-11-30 22:06:38

解决方案2
2 2016-12-01 07:46:47