Pandas resample data by a non-timeseries column (e.g. Price)

Question

Renko Chart Wiki: https://en.wikipedia.org/wiki/Renko_chart

I'm trying to generate a renko chart using the trade tick data. The tick data contains Timestamp, Price, Volume. The Timestamp is in unix milliseconds format. eg 1649289600174.

Pandas already supports OHLC resampling via df.resample('10Min').agg({'Price': 'ohlc', 'volume': 'sum'}) . However, I would like resample trade data based on price . Not by Timestamp.

Renko chart utilises a fixed brick size. For example, a brick_size of 10 would generate a brick if the price goes up 10 points, or go down 10 points.

I have been told by a pandas contributor that this can be done via groupby with a binned grouper . However, I don't quite understand what he is talking about.

This is how my original data looks like.

Timestamp           Price               Volume

1649289600174       100                 100
1649289600176       105                 100
1649289600178       110                 100
1649289600179       104                 100
1649289600181       101                 100
1649289600182       100                 100
1649289600183       103                 100
1649289600184       107                 100
1649289600185       102                 100
1649289600186        99                 100
1649289600188        93                 100
1649289600189        90                 100
1649289600192        95                 100
1649289600193       100                 100
1649289600194       105                 100
1649289600195       110                 100
1649289600196       115                 100
1649289600197       120                 100

I'm looking for an option that looks like, df.resample('10Numeric').agg({'Price': 'ohlc', 'volume': 'sum'}) . Here 10Numeric says, the brick_size is 10. If the price goes up 10 points, or go down 10 points, then I would like to aggregate the data within that period.

The output should look like

Timestamp           Open    High    Low    Close               Volume
    
1649289600178       100     110     100     110                 300
1649289600182       110     110     100     100                 300
1649289600189       100     107      90      90                 600
1649289600193        90     100      90     100                 200
1649289600195       100     110     100     110                 200
1649289600197       110     120     110     120                 200

I believe the pandas contributor talking about pd.cut option. And then do groupby. Something like this.

import pandas as pd
import numpy as np

df = pd.DataFrame({'price': np.random.randint(1, 100, 1000)})
df['bins'] = pd.cut(x=df['price'], bins=[0, 10, 20, 30, 40, 50, 60,
                                          70, 80, 90, 100])

That output looks like this.

      price       bins
0       92  (90, 100]
1       15   (10, 20]
2       54   (50, 60]
3       55   (50, 60]
4       72   (70, 80]
..     ...        ...
95      88   (80, 90]
96      21   (20, 30]
97      91  (90, 100]
98      51   (50, 60]
99      18   (10, 20]

Please note: Price data is not unique. The price of bitcoin one year back would have looked like 45555 USD. But it's again at the same price this year. If I use a 100 bin size, it would be in (45500, 45600).

A groupby would put both 1 year ago data and current data at the same bin. I'm looking for a solution that follows the price movement. eg The closing prices should look like this 45500, 45600, 45700, 45600, 45500, 45400, 45300, 45200, 45100, 45000

Can someone explain what the pandas contributor mean when he says, groupby with a binned grouper ?

Answer 1

Is this what you're looking for?

df['bins'] = pd.cut(x=df['Price'], bins=range(df['Price'].min(), df['Price'].max(), 10))
df.groupby('bins').agg({'Price': 'ohlc', 'Volume': 'sum'})

Output:

           Price                 Volume
            open high  low close Volume
bins                                   
(90, 100]    100  100   93   100    600
(100, 110]   105  110  101   110    900

Answer 2

You could create a new column based on pd.cut , do a cumsum , and group by that.

import pandas as pd
import numpy as np

df = pd.DataFrame(
    [
        {"Timestamp": 1649289600174, "Price": 100, "Volume": 100},
        {"Timestamp": 1649289600176, "Price": 105, "Volume": 100},
        {"Timestamp": 1649289600178, "Price": 110, "Volume": 100},
        {"Timestamp": 1649289600179, "Price": 104, "Volume": 100},
        {"Timestamp": 1649289600181, "Price": 101, "Volume": 100},
        {"Timestamp": 1649289600182, "Price": 100, "Volume": 100},
        {"Timestamp": 1649289600183, "Price": 103, "Volume": 100},
        {"Timestamp": 1649289600184, "Price": 107, "Volume": 100},
        {"Timestamp": 1649289600185, "Price": 102, "Volume": 100},
        {"Timestamp": 1649289600186, "Price": 99, "Volume": 100},
        {"Timestamp": 1649289600188, "Price": 93, "Volume": 100},
        {"Timestamp": 1649289600189, "Price": 90, "Volume": 100},
        {"Timestamp": 1649289600192, "Price": 95, "Volume": 100},
        {"Timestamp": 1649289600193, "Price": 100, "Volume": 100},
        {"Timestamp": 1649289600194, "Price": 105, "Volume": 100},
        {"Timestamp": 1649289600195, "Price": 110, "Volume": 100},
        {"Timestamp": 1649289600196, "Price": 115, "Volume": 100},
        {"Timestamp": 1649289600197, "Price": 120, "Volume": 100},
    ]
)
codes = pd.cut(df["Price"], bins=np.arange(0, 200, 10), right=False).cat.codes
df.groupby((codes != codes.shift(1)).cumsum()).agg(
    {"Price": "ohlc", "Volume": "sum", "Timestamp": "min"}
)

This'll give you:

  Price                 Volume      Timestamp
   open high  low close Volume      Timestamp
1   100  105  100   105    200  1649289600174
2   110  110  110   110    100  1649289600178
3   104  107  100   102    600  1649289600179
4    99   99   90    95    400  1649289600186
5   100  105  100   105    200  1649289600193
6   110  115  110   115    200  1649289600195
7   120  120  120   120    100  1649289600197

Pandas resample data by a non-timeseries column (e.g. Price)

Question

2 answers

solution1
1 2022-04-20 01:58:07

solution2
1 ACCPTED 2022-04-20 07:13:50

Pandas resample data by a non-timeseries column (e.g. Price)

Question

2 answers

solution1 1 2022-04-20 01:58:07

solution2 1 ACCPTED 2022-04-20 07:13:50

solution1
1 2022-04-20 01:58:07

solution2
1 ACCPTED 2022-04-20 07:13:50