简体   繁体   中英

Check if a value in dataframe will increase or decrease a certain percentage

I have an OHLC dataframe, eg:

index open close high low
2021-03-23 10:00:00+00:00 1421.100 1424.500 1427.720 1422.650
2021-03-23 11:00:00+00:00 1424.500 1421.480 1422.400 1411.890
2021-03-23 12:00:00+00:00 1421.480 1435.170 1443.980 1433.780
2021-03-23 13:00:00+00:00 1435.170 1440.860 1443.190 1437.590
2021-03-23 14:00:00+00:00 1440.860 1438.920 1443.570 1435.200
2021-03-23 15:00:00+00:00 1438.920 1435.990 1444.840 1435.060
2021-03-23 16:00:00+00:00 1435.990 1441.920 1446.610 1441.450

Now I want to find out, if the price will first increase or decrease eg for 1%. What I have so far is the following working code:

def check(x):

    check = ohlc[ohlc.index > x.name]
    price = ohlc.at[x.name, 'close']

    high_thr = price * 1.01
    low_thr = price * 0.99

    high_indexes = check[check['high'] > high_thr]
    low_indexes = check[check['low'] < low_thr]

    if high_indexes.shape[0] > 0 and low_indexes.shape[0] > 0:

        high = high_indexes.index[0]
        low = low_indexes.index[0]

        if high < low:
            return 1
        elif high > low:
            return -1
        else:
            return 0
    else:
        return 0


ohlc['check'] = ohlc.apply(find_threshold, axis=1)

This is extremely slow for larger datasets. Is there any other better way than iterating over every row, slicing and finding all indexes to get the nearest one?

I think the best way to do this is not too different from how you're doing it:

from datetime import timedelta

def check(x, change=0.01):
    time = x.name
    price = ohlc.loc[time, 'close']
    while True:
        if time not in ohlc.index:          # If we reach the end
            return 0
        high = ohlc.loc[time, 'high']
        low = ohlc.loc[time, 'low']
        if high > (1.0 + change) * price:   # Upper thresh broken
            return 1
        elif low < 1.0 - change) * price:   # Lower thresh broken
            return -1
        time = time + timedelta(hours=1)    # Time update

ohlc['check'] = ohlc.apply(check, axis=1)

If efficiency is what you're worreid about, applying this way is slightly more efficient because it only looks ahead as far as it needs to to break the threshold. Optionally, you could limit this to up to, say, 100 hours into the future by modifying the while loop, capping the number of checks per row to 100:

    endtime = time + timedelta(hours=100)
    while time < endtime:
        # etc

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM