简体   繁体   English

Groupby进入数据帧的平均时间delta64 [ns]

[英]Average of Groupby into a dataframe timedelta64[ns]

df1 df1

|Project |Days
|A       |20 days
|B       |10 days
|A       |10 days
|C       |5 days
|C       |7 days
|B       |8 days

R = df1['Days'].groupby(df1['Project']) R = df1 ['Days']。groupby(df1 ['Project'])

R [R

|20 days
|10 days
|Name: Days, dtype: timedelta64[ns],('A', 30   15 days)
|10 days
|8 days
|Name: Days, dtype: timedelta64[ns],('B', 18   9 days)
|5 days
|7 days
|Name: Days, dtype: timedelta64[ns],('C', 12   6 days)

df2 df2

|Project  |Date     |**New Date**
|A        |1/10/16  |1/25/16 
|A        |1/8/16   |1/23/16
|C        |1/2/16   |1/8/16
|B        |1/9/16   |1/18/16    

What I'm trying to do is create df2['New Date'] by using the average of days by Project in df1 and add that average to df2['Date'] . 我想做的是通过使用df1中Project的平均天数来创建df2['New Date']并将该平均值添加到df2['Date'] Any thoughts? 有什么想法吗?

Also wanted to add. 也想补充。 The 'Days' column is generated from taking the difference between two dates loaded in from an excel spreadsheet. “天”列是通过获取从Excel电子表格加载的两个日期之间的差生成的。

****EDIT**** ****编辑****

df1.head().to_dict('list') df1.head()。to_dict('list')

 {'Project': ['210001', '210001', '210001', '210001', '210001'], 'Days':
 [Timedelta('8 days 00:00:00'), Timedelta('8 days 00:00:00'), Timedelta('12 days
 00:00:00'), Timedelta('12 days 00:00:00'), Timedelta('14 days 00:00:00')]}

df1.info() df1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1161 entries, 0 to 1278
Data columns (total 2 columns):
Project    1161 non-null object
Days      1161 non-null timedelta64[ns]
dtypes: object(1), timedelta64[ns](1)
memory usage: 22.7+ KB
None

EDIT #2 INT This is the error I'm getting on: OverflowError: int too big to convert 编辑#2 INT这是我遇到的错误:OverflowError:int太大,无法转换

df2['New Date'] = df2['Date'] + pd.to_timedelta(df2['Days'], unit='D')

Days
20.569231
15.795455
20.569231
import pandas as pd
df1 = pd.DataFrame(
    {'Days': ['20 days', '10 days', '10 days', '5 days', '7 days', '8 days'],
     'Project': ['A', 'B', 'A', 'C', 'C', 'B']}) 

df2 = pd.DataFrame(
    {'Date': ['1/10/16', '1/8/16', '1/2/16', '1/9/16'],
     'Project': ['A', 'A', 'C', 'B']})

df1['Days'] = pd.to_timedelta(df1['Days']) 
df2['Date'] = pd.to_datetime(df2['Date'])

result = df1.groupby('Project')['Days'].agg(['sum', 'count'])
result['Days'] = result['sum']/result['count']
df2 = pd.merge(df2, result[['Days']], left_on='Project', right_index=True)
df2['New Date'] = df2['Date'] + df2['Days']
print(df2)

yields 产量

        Date Project  Days   New Date
0 2016-01-10       A  15.0 2016-01-25
1 2016-01-08       A  15.0 2016-01-23
2 2016-01-02       C   6.0 2016-01-08
3 2016-01-09       B   9.0 2016-01-18

Compute the groupby/mean : 计算groupby/mean

result = df1.groupby('Project')['Days'].agg(['sum', 'count'])
result['Days'] = result['sum']/result['count']
#             sum  count    Days
# Project                       
# A       30 days      2 15 days
# B       18 days      2  9 days
# C       12 days      2  6 days

and merge this result with df2 (joining on Project ): 并将此resultdf2合并(加入Project ):

df2 = pd.merge(df2, result[['Days']], left_on='Project', right_index=True)
#         Date Project    Days
# 0 2016-01-10       A 15 days
# 1 2016-01-08       A 15 days
# 2 2016-01-02       C  6 days
# 3 2016-01-09       B  9 days

Then, add the Days to the Date : 然后,将Days添加到Date

df2['New Date'] = df2['Date'] + df2['Days']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM