简体   繁体   English

Python 在聚合中操作时间序列数据

[英]Python manipulate timeseries data in aggregation

I have a timeseries dataframe which contains the columns as shown below:我有一个时间序列数据框,其中包含如下所示的列:

    perf_date  pull_date  clicks  conv      rev 
    2019-01-21 2019-01-28   56     9        44.12
    2019-01-22 2019-01-28   56     10       44.70
               2019-01-29   56     10       44.70
    2019-01-23 2019-01-28   59     13       89.31
               2019-01-29   59     13       89.31
               2019-01-30   59     14       95.31

what I want to do is: 1) Keep all row values of the first row against each perf_date.我想要做的是:1)针对每个 perf_date 保留第一行的所有行值。 2) Append the values of revenue for the largest pull_date against each perf_date. 2) 将最大 pull_date 的收入值附加到每个 perf_date。 So after the manipulation the above dataframe should be like this:所以在操作之后,上面的数据框应该是这样的:

    perf_date  pull_date  clicks  conv      rev 
    2019-01-21 2019-01-28   56     9        44.12
    2019-01-22 2019-01-28   56     10       44.70
    2019-01-23 2019-01-28   59     13       95.31

Use GroupBy.agg with dictionary of columns with aggregate functions - you can pass it manually or dynamic - all columns without perf_date and rev are aggregate by first and rev by last :GroupBy.agg与带有聚合函数的列字典一起使用 - 您可以手动或动态传递它 - 所有没有perf_daterev列都按first聚合, revlast聚合:

#if necessary
df['perf_date'] = df['perf_date'].ffill()
df = df.sort_values(['perf_date','pull_date'])

d = dict.fromkeys(df.columns.difference(['perf_date','rev']), 'first')
d['rev'] = 'last'
print (d)
{'clicks': 'first', 'conv': 'first', 'pull_date': 'first', 'rev': 'last'}

df = df.groupby('perf_date', as_index=False).agg(d).reindex(df.columns, axis=1)
print (df)
    perf_date   pull_date  clicks  conv    rev
0  2019-01-21  2019-01-28      56     9  44.12
1  2019-01-22  2019-01-28      56    10  44.70
2  2019-01-23  2019-01-28      59    13  95.31

EDIT:编辑:

d = dict.fromkeys(df.columns.difference(['perf_date','rev']), 'first')
df1 = df.groupby('perf_date', as_index=False).agg(d)
s = df.groupby('perf_date')['rev'].nth(2)
df = df1.join(s, on='perf_date')
print (df)
    perf_date  clicks  conv   pull_date    rev
0  2019-01-21      56     9  2019-01-28    NaN
1  2019-01-22      56    10  2019-01-28    NaN
2  2019-01-23      59    13  2019-01-28  95.31

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM