Python 在聚合中操作时间序列数据

Question

I have a timeseries dataframe which contains the columns as shown below:我有一个时间序列数据框，其中包含如下所示的列：

    perf_date  pull_date  clicks  conv      rev 
    2019-01-21 2019-01-28   56     9        44.12
    2019-01-22 2019-01-28   56     10       44.70
               2019-01-29   56     10       44.70
    2019-01-23 2019-01-28   59     13       89.31
               2019-01-29   59     13       89.31
               2019-01-30   59     14       95.31

what I want to do is: 1) Keep all row values of the first row against each perf_date.我想要做的是：1）针对每个 perf_date 保留第一行的所有行值。 2) Append the values of revenue for the largest pull_date against each perf_date. 2) 将最大 pull_date 的收入值附加到每个 perf_date。 So after the manipulation the above dataframe should be like this:所以在操作之后，上面的数据框应该是这样的：

    perf_date  pull_date  clicks  conv      rev 
    2019-01-21 2019-01-28   56     9        44.12
    2019-01-22 2019-01-28   56     10       44.70
    2019-01-23 2019-01-28   59     13       95.31

Answer 1

Use GroupBy.agg with dictionary of columns with aggregate functions - you can pass it manually or dynamic - all columns without perf_date and rev are aggregate by first and rev by last :将GroupBy.agg与带有聚合函数的列字典一起使用 - 您可以手动或动态传递它 - 所有没有perf_date和rev列都按first聚合， rev按last聚合：

#if necessary
df['perf_date'] = df['perf_date'].ffill()
df = df.sort_values(['perf_date','pull_date'])

d = dict.fromkeys(df.columns.difference(['perf_date','rev']), 'first')
d['rev'] = 'last'
print (d)
{'clicks': 'first', 'conv': 'first', 'pull_date': 'first', 'rev': 'last'}

df = df.groupby('perf_date', as_index=False).agg(d).reindex(df.columns, axis=1)
print (df)
    perf_date   pull_date  clicks  conv    rev
0  2019-01-21  2019-01-28      56     9  44.12
1  2019-01-22  2019-01-28      56    10  44.70
2  2019-01-23  2019-01-28      59    13  95.31

EDIT:编辑：

d = dict.fromkeys(df.columns.difference(['perf_date','rev']), 'first')
df1 = df.groupby('perf_date', as_index=False).agg(d)
s = df.groupby('perf_date')['rev'].nth(2)
df = df1.join(s, on='perf_date')
print (df)
    perf_date  clicks  conv   pull_date    rev
0  2019-01-21      56     9  2019-01-28    NaN
1  2019-01-22      56    10  2019-01-28    NaN
2  2019-01-23      59    13  2019-01-28  95.31

Python 在聚合中操作时间序列数据

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-04-27 09:57:31

Python 在聚合中操作时间序列数据

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-04-27 09:57:31

解决方案1
1 已采纳 2019-04-27 09:57:31