[英]How to crosstab or count dataframe rows by date in pandas
I am fairly new to working with pandas.我对大熊猫工作还很陌生。 I have a dataframe with individual entries like this:我有一个包含这样的单个条目的数据框:
dfImport: df导入:
id ID | date_created创建日期 | date_closed日期_关闭 |
---|---|---|
0 0 | 01-07-2020 01-07-2020 | |
1 1 | 02-09-2020 02-09-2020 | 10-09-2020 10-09-2020 |
2 2 | 07-03-2019 07-03-2019 | 02-09-2020 02-09-2020 |
I would like to filter it in a way, that I get the total number of created and closed objects (count id's) grouped by Year and Quarter and Month like this:我想以某种方式过滤它,我得到按年、季度和月分组的创建和关闭对象(计数 ID)的总数,如下所示:
dfInOut: df输入输出:
Year年 | Qrt量子点 | month月 | number_created number_created | number_closed number_closed |
---|---|---|---|---|
2019 2019年 | 1 1 | March行进 | 1 1 | 0 0 |
2020 2020年 | 3 3 | July七月 | 1 1 | 0 0 |
September九月 | 1 1 | 2 2 |
I guess I'd have to use some combination of crosstab or group_by, but I tried out alot of ideas and already did research on the problem, but I can't seem to figure out a way.我想我必须使用 crosstab 或 group_by 的某种组合,但是我尝试了很多想法并且已经对该问题进行了研究,但是我似乎无法找到一种方法。 I guess it's an issue of understanding.估计是理解的问题。 Thanks in advance!提前致谢!
Use DataFrame.melt
with crosstab
:将DataFrame.melt
与crosstab
DataFrame.melt
使用:
df['date_created'] = pd.to_datetime(df['date_created'], dayfirst=True)
df['date_closed'] = pd.to_datetime(df['date_closed'], dayfirst=True)
df1 = df.melt(value_vars=['date_created','date_closed']).dropna()
df = (pd.crosstab([df1['value'].dt.year.rename('Year'),
df1['value'].dt.quarter.rename('Qrt'),
df1['value'].dt.month.rename('Month')], df1['variable'])
[['date_created','date_closed']])
print (df)
variable date_created date_closed
Year Qrt Month
2019 1 3 1 0
2020 3 7 1 0
9 1 2
df = df.rename_axis(None, axis=1).reset_index()
print (df)
Year Qrt Month date_created date_closed
0 2019 1 3 1 0
1 2020 3 7 1 0
2 2020 3 9 1 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.