简体   繁体   中英

Pandas Dataframe resample using groupby

I have a dataframe in pandas of the following form:

                       timestap  price    bid    ask  volume
0       2014-06-04 12:11:03.058  21.11  41.12   0.00       0
1       2014-06-04 12:11:03.386  21.17  41.18   0.00       0
2       2014-06-04 12:11:03.435  21.20  41.21   0.00       0
3       2014-06-04 12:11:04.125  21.17  41.19   0.00       0
4       2014-06-04 12:11:04.245  21.16  41.17   0.00       0

What I should do:

  1. Put timestap instead of index
  2. Resample timestamp using groupby (timestap should be grouped by second)
  3. Show the first and the last digit of each column on the equal date and time

The final dataframe should be look like this:

                            price           bid         ask    volume
           timestap    min    max    min    max   min   max  min  max
2014-06-04 12:11:03  21.11  21.20  41.12  41.21  0.00  0.00    0    0
2014-06-04 12:11:04  21.16  21.17  41.17  41.19  0.00  0.00    0    0

What I have now:

import pandas as pd
data = pd.read_csv('table.csv')
data.columns = ['timestap', 'bid', 'ask', 'price', 'volume']
data = data.set_index(data.time)
bydate = data.groupby(pd.TimeGrouper(freq='s'))

Something going wrong on my code and I don't have an idea, how to do the last task. Can you help me?

Use agg function and pass a list of aggregation functions to it with either resample or pd.TimeGrouper :

# make sure the timestamp column is of date time type
df['timestap'] = pd.to_datetime(df['timestap'])

df.set_index('timestap').resample("s").agg(["min", "max"])

在此处输入图片说明

Or use TimeGrouper :

df.set_index('timestap').groupby(pd.TimeGrouper(freq='s')).agg(['min', 'max'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM