[英]Average of Groupby into a dataframe timedelta64[ns]
df1 df1
|Project |Days
|A |20 days
|B |10 days
|A |10 days
|C |5 days
|C |7 days
|B |8 days
R = df1['Days'].groupby(df1['Project']) R = df1 ['Days']。groupby(df1 ['Project'])
R [R
|20 days
|10 days
|Name: Days, dtype: timedelta64[ns],('A', 30 15 days)
|10 days
|8 days
|Name: Days, dtype: timedelta64[ns],('B', 18 9 days)
|5 days
|7 days
|Name: Days, dtype: timedelta64[ns],('C', 12 6 days)
df2 df2
|Project |Date |**New Date**
|A |1/10/16 |1/25/16
|A |1/8/16 |1/23/16
|C |1/2/16 |1/8/16
|B |1/9/16 |1/18/16
What I'm trying to do is create df2['New Date']
by using the average of days by Project in df1 and add that average to df2['Date']
. 我想做的是通过使用df1中Project的平均天数来创建
df2['New Date']
并将该平均值添加到df2['Date']
。 Any thoughts? 有什么想法吗?
Also wanted to add. 也想补充。 The 'Days' column is generated from taking the difference between two dates loaded in from an excel spreadsheet.
“天”列是通过获取从Excel电子表格加载的两个日期之间的差生成的。
****EDIT**** ****编辑****
df1.head().to_dict('list') df1.head()。to_dict('list')
{'Project': ['210001', '210001', '210001', '210001', '210001'], 'Days':
[Timedelta('8 days 00:00:00'), Timedelta('8 days 00:00:00'), Timedelta('12 days
00:00:00'), Timedelta('12 days 00:00:00'), Timedelta('14 days 00:00:00')]}
df1.info() df1.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1161 entries, 0 to 1278
Data columns (total 2 columns):
Project 1161 non-null object
Days 1161 non-null timedelta64[ns]
dtypes: object(1), timedelta64[ns](1)
memory usage: 22.7+ KB
None
EDIT #2 INT This is the error I'm getting on: OverflowError: int too big to convert 编辑#2 INT这是我遇到的错误:OverflowError:int太大,无法转换
df2['New Date'] = df2['Date'] + pd.to_timedelta(df2['Days'], unit='D')
Days
20.569231
15.795455
20.569231
import pandas as pd
df1 = pd.DataFrame(
{'Days': ['20 days', '10 days', '10 days', '5 days', '7 days', '8 days'],
'Project': ['A', 'B', 'A', 'C', 'C', 'B']})
df2 = pd.DataFrame(
{'Date': ['1/10/16', '1/8/16', '1/2/16', '1/9/16'],
'Project': ['A', 'A', 'C', 'B']})
df1['Days'] = pd.to_timedelta(df1['Days'])
df2['Date'] = pd.to_datetime(df2['Date'])
result = df1.groupby('Project')['Days'].agg(['sum', 'count'])
result['Days'] = result['sum']/result['count']
df2 = pd.merge(df2, result[['Days']], left_on='Project', right_index=True)
df2['New Date'] = df2['Date'] + df2['Days']
print(df2)
yields 产量
Date Project Days New Date
0 2016-01-10 A 15.0 2016-01-25
1 2016-01-08 A 15.0 2016-01-23
2 2016-01-02 C 6.0 2016-01-08
3 2016-01-09 B 9.0 2016-01-18
Compute the groupby/mean
: 计算
groupby/mean
:
result = df1.groupby('Project')['Days'].agg(['sum', 'count'])
result['Days'] = result['sum']/result['count']
# sum count Days
# Project
# A 30 days 2 15 days
# B 18 days 2 9 days
# C 12 days 2 6 days
and merge this result
with df2
(joining on Project
): 并将此
result
与df2
合并(加入Project
):
df2 = pd.merge(df2, result[['Days']], left_on='Project', right_index=True)
# Date Project Days
# 0 2016-01-10 A 15 days
# 1 2016-01-08 A 15 days
# 2 2016-01-02 C 6 days
# 3 2016-01-09 B 9 days
Then, add the Days
to the Date
: 然后,将
Days
添加到Date
:
df2['New Date'] = df2['Date'] + df2['Days']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.