[英]Python manipulate timeseries data in aggregation
I have a timeseries dataframe which contains the columns as shown below:我有一个时间序列数据框,其中包含如下所示的列:
perf_date pull_date clicks conv rev
2019-01-21 2019-01-28 56 9 44.12
2019-01-22 2019-01-28 56 10 44.70
2019-01-29 56 10 44.70
2019-01-23 2019-01-28 59 13 89.31
2019-01-29 59 13 89.31
2019-01-30 59 14 95.31
what I want to do is: 1) Keep all row values of the first row against each perf_date.我想要做的是:1)针对每个 perf_date 保留第一行的所有行值。 2) Append the values of revenue for the largest pull_date against each perf_date.
2) 将最大 pull_date 的收入值附加到每个 perf_date。 So after the manipulation the above dataframe should be like this:
所以在操作之后,上面的数据框应该是这样的:
perf_date pull_date clicks conv rev
2019-01-21 2019-01-28 56 9 44.12
2019-01-22 2019-01-28 56 10 44.70
2019-01-23 2019-01-28 59 13 95.31
Use GroupBy.agg
with dictionary of columns with aggregate functions - you can pass it manually or dynamic - all columns without perf_date
and rev
are aggregate by first
and rev
by last
:将
GroupBy.agg
与带有聚合函数的列字典一起使用 - 您可以手动或动态传递它 - 所有没有perf_date
和rev
列都按first
聚合, rev
按last
聚合:
#if necessary
df['perf_date'] = df['perf_date'].ffill()
df = df.sort_values(['perf_date','pull_date'])
d = dict.fromkeys(df.columns.difference(['perf_date','rev']), 'first')
d['rev'] = 'last'
print (d)
{'clicks': 'first', 'conv': 'first', 'pull_date': 'first', 'rev': 'last'}
df = df.groupby('perf_date', as_index=False).agg(d).reindex(df.columns, axis=1)
print (df)
perf_date pull_date clicks conv rev
0 2019-01-21 2019-01-28 56 9 44.12
1 2019-01-22 2019-01-28 56 10 44.70
2 2019-01-23 2019-01-28 59 13 95.31
EDIT:编辑:
d = dict.fromkeys(df.columns.difference(['perf_date','rev']), 'first')
df1 = df.groupby('perf_date', as_index=False).agg(d)
s = df.groupby('perf_date')['rev'].nth(2)
df = df1.join(s, on='perf_date')
print (df)
perf_date clicks conv pull_date rev
0 2019-01-21 56 9 2019-01-28 NaN
1 2019-01-22 56 10 2019-01-28 NaN
2 2019-01-23 59 13 2019-01-28 95.31
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.