简体   繁体   English

pandas groupby并汇总到新专栏

[英]pandas groupby and aggregate into new column

did some searching but nothing yields the desired result, which is grouping the data by date and counting the frequency. 做了一些搜索,但没有产生所需的结果,即按日期对数据进行分组并计算频率。 I am able to do this with aggregate but I'm not sure how to create a new column with the results, thanks. 我可以用聚合做到这一点,但我不知道如何用结果创建一个新列,谢谢。

data in file: 文件中的数据:

Domain  Dates
twitter.com 2016-08-08
google.com  2016-08-09
apple.com   2016-08-09
linkedin.com    2016-08-09
microsoft.com   2016-08-09
slack.com   2016-08-12
instagram.com   2016-08-12
ibm.com 2016-08-12

code

import pandas as pd
import matplotlib.pyplot as plt
import datetime
import numpy as np

df = pd.read_csv('domains.tsv', sep='\t')
df = df.groupby([pd.to_datetime(df.Dates).dt.date]).agg({'Dates':'size'})
print(df)

yields 产量

            Dates
Dates
2016-08-08      1
2016-08-09      4
2016-08-12      3

Ideally, I would like the count column to be 'count' and then I will save as a new csv. 理想情况下,我希望count列为'count',然后我将保存为新的csv。

import pandas as pd


df = pd.read_csv('domains.tsv', sep='\t')
counter = df.groupby('Dates').count().rename(columns={'Domain': 'count'})
counter.to_csv('count.csv')

You will get count.csv including following result on your current dir. 您将获得count.csv,包括您当前目录的结果。

Dates,count
2016-08-08,1
2016-08-09,4
2016-08-12,3
df['count'] = df.groupby(['Dates']).transform('count')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM