简体   繁体   中英

Python - how can i find the angle of a line at a given point?

I'm dealing with simple OHLC time series data, here is a sample:

2021-02-26 08:00:00  51491.322786
2021-02-26 12:00:00  51373.462137
2021-02-26 16:00:00  51244.591670
2021-02-26 20:00:00  51061.134204
2021-02-27 00:00:00  50985.592434
2021-02-27 04:00:00  50923.287370
2021-02-27 08:00:00  50842.103282
2021-02-27 12:00:00  50695.160604
2021-02-27 16:00:00  50608.462150
2021-02-27 20:00:00  50455.235146
2021-02-28 00:00:00  50177.377531
2021-02-28 04:00:00  49936.652091
2021-02-28 08:00:00  49860.396537
2021-02-28 12:00:00  49651.901082
2021-02-28 16:00:00  49625.153441
2021-02-28 20:00:00  49570.275193
2021-03-01 00:00:00  49531.874272
2021-03-01 04:00:00  49510.381676
2021-03-01 08:00:00  49486.289712
2021-03-01 12:00:00  49481.496645
2021-03-01 16:00:00  49469.806692
2021-03-01 20:00:00  49471.958606
2021-03-02 00:00:00  49462.095568
2021-03-02 04:00:00  49453.473575
2021-03-02 08:00:00  49438.986536
2021-03-02 12:00:00  49409.492007
2021-03-02 16:00:00  49356.563396
2021-03-02 20:00:00  49331.037118
2021-03-03 00:00:00  49297.823947
2021-03-03 04:00:00  49322.049974
2021-03-03 08:00:00  49461.314013
2021-03-03 12:00:00  49515.137712
2021-03-03 16:00:00  49571.990877
2021-03-03 20:00:00  49592.320461
2021-03-04 00:00:00  49592.249409
2021-03-04 04:00:00  49593.938380
2021-03-04 08:00:00  49593.055971
2021-03-04 12:00:00  49592.025698
2021-03-04 16:00:00  49585.661437
2021-03-04 20:00:00  49578.693824
2021-03-05 00:00:00  49543.067346
2021-03-05 04:00:00  49540.706794
2021-03-05 08:00:00  49513.586831
2021-03-05 12:00:00  49494.990328
2021-03-05 16:00:00  49493.807248
2021-03-05 20:00:00  49461.133698
2021-03-06 00:00:00  49432.770930
2021-03-06 04:00:00  49412.087821
2021-03-06 08:00:00  49368.106499
2021-03-06 12:00:00  49290.581114
2021-03-06 16:00:00  49272.222740
2021-03-06 20:00:00  49269.814982
2021-03-07 00:00:00  49270.328825
2021-03-07 04:00:00  49293.664209
2021-03-07 08:00:00  49339.999430
2021-03-07 12:00:00  49404.798067
2021-03-07 16:00:00  49450.447631
2021-03-07 20:00:00  49528.402294
2021-03-08 00:00:00  49571.353158
2021-03-08 04:00:00  49572.687451
2021-03-08 08:00:00  49597.518988
2021-03-08 12:00:00  49648.407014
2021-03-08 16:00:00  49708.063384
2021-03-08 20:00:00  49862.237773
2021-03-09 00:00:00  50200.833030
2021-03-09 04:00:00  50446.201489
2021-03-09 08:00:00  50727.063301
2021-03-09 12:00:00  50952.697141
2021-03-09 16:00:00  51152.798741
2021-03-09 20:00:00  51392.873289
2021-03-10 00:00:00  51472.273233
2021-03-10 04:00:00  51601.351944
2021-03-10 08:00:00  51759.387477
2021-03-10 12:00:00  52053.982892
2021-03-10 16:00:00  52437.071119
2021-03-10 20:00:00  52648.225156

I'm trying to find a way to get how inclined or steep the line is at each point. Basically i only need to know if the line is going up, down or sideways and by how much, so the ideal would be to get some sort of coefficient or number that tells me how steep the line is.

In order to do that, i had the idea of calculating the slope, so i tried the following code that i got from here :

def slope( close, length=None, as_angle=None, to_degrees=None, vertical=None, offset=None, **kwargs):
    """Indicator: Slope"""
    # Validate arguments
    length = int(length) if length and length > 0 else 1
    as_angle = True if isinstance(as_angle, bool) else False
    to_degrees = True if isinstance(to_degrees, bool) else False
    close = verify_series(close, length)
    offset = get_offset(offset)

    if close is None: return

    # Calculate Result
    slope = close.diff(length) / length
    if as_angle:
        slope = slope.apply(npAtan)
        if to_degrees:
            slope *= 180 / npPi

    # Offset
    if offset != 0:
        slope = slope.shift(offset)

    # Handle fills
    if "fillna" in kwargs:
        slope.fillna(kwargs["fillna"], inplace=True)
    if "fill_method" in kwargs:
        slope.fillna(method=kwargs["fill_method"], inplace=True)

    # Name and Categorize it
    slope.name = f"SLOPE_{length}" if not as_angle else f"ANGLE{'d' if to_degrees else 'r'}_{length}"
    slope.category = "momentum"

    return slope 

Here is a sample of the output:

2021-02-26 08:00:00  51491.322786 -110.850644
2021-02-26 12:00:00  51373.462137 -117.860648
2021-02-26 16:00:00  51244.591670 -128.870468
2021-02-26 20:00:00  51061.134204 -183.457466
2021-02-27 00:00:00  50985.592434  -75.541770
2021-02-27 04:00:00  50923.287370  -62.305064
2021-02-27 08:00:00  50842.103282  -81.184088
2021-02-27 12:00:00  50695.160604 -146.942678
2021-02-27 16:00:00  50608.462150  -86.698454
2021-02-27 20:00:00  50455.235146 -153.227004
2021-02-28 00:00:00  50177.377531 -277.857615
2021-02-28 04:00:00  49936.652091 -240.725440
2021-02-28 08:00:00  49860.396537  -76.255553
2021-02-28 12:00:00  49651.901082 -208.495455
2021-02-28 16:00:00  49625.153441  -26.747641
2021-02-28 20:00:00  49570.275193  -54.878249
2021-03-01 00:00:00  49531.874272  -38.400921
2021-03-01 04:00:00  49510.381676  -21.492596
2021-03-01 08:00:00  49486.289712  -24.091964
2021-03-01 12:00:00  49481.496645   -4.793067
2021-03-01 16:00:00  49469.806692  -11.689953
2021-03-01 20:00:00  49471.958606    2.151914
2021-03-02 00:00:00  49462.095568   -9.863038
2021-03-02 04:00:00  49453.473575   -8.621994
2021-03-02 08:00:00  49438.986536  -14.487039
2021-03-02 12:00:00  49409.492007  -29.494528
2021-03-02 16:00:00  49356.563396  -52.928611
2021-03-02 20:00:00  49331.037118  -25.526278
2021-03-03 00:00:00  49297.823947  -33.213171
2021-03-03 04:00:00  49322.049974   24.226027
2021-03-03 08:00:00  49461.314013  139.264040
2021-03-03 12:00:00  49515.137712   53.823699
2021-03-03 16:00:00  49571.990877   56.853165
2021-03-03 20:00:00  49592.320461   20.329584
2021-03-04 00:00:00  49592.249409   -0.071052
2021-03-04 04:00:00  49593.938380    1.688971
2021-03-04 08:00:00  49593.055971   -0.882409
2021-03-04 12:00:00  49592.025698   -1.030273
2021-03-04 16:00:00  49585.661437   -6.364260
2021-03-04 20:00:00  49578.693824   -6.967614
2021-03-05 00:00:00  49543.067346  -35.626478
2021-03-05 04:00:00  49540.706794   -2.360551
2021-03-05 08:00:00  49513.586831  -27.119963
2021-03-05 12:00:00  49494.990328  -18.596504
2021-03-05 16:00:00  49493.807248   -1.183080
2021-03-05 20:00:00  49461.133698  -32.673550
2021-03-06 00:00:00  49432.770930  -28.362769
2021-03-06 04:00:00  49412.087821  -20.683109
2021-03-06 08:00:00  49368.106499  -43.981322
2021-03-06 12:00:00  49290.581114  -77.525385
2021-03-06 16:00:00  49272.222740  -18.358373
2021-03-06 20:00:00  49269.814982   -2.407758
2021-03-07 00:00:00  49270.328825    0.513843
2021-03-07 04:00:00  49293.664209   23.335384
2021-03-07 08:00:00  49339.999430   46.335221
2021-03-07 12:00:00  49404.798067   64.798637
2021-03-07 16:00:00  49450.447631   45.649564
2021-03-07 20:00:00  49528.402294   77.954663
2021-03-08 00:00:00  49571.353158   42.950863
2021-03-08 04:00:00  49572.687451    1.334294
2021-03-08 08:00:00  49597.518988   24.831537
2021-03-08 12:00:00  49648.407014   50.888026
2021-03-08 16:00:00  49708.063384   59.656369
2021-03-08 20:00:00  49862.237773  154.174389
2021-03-09 00:00:00  50200.833030  338.595257
2021-03-09 04:00:00  50446.201489  245.368460
2021-03-09 08:00:00  50727.063301  280.861811
2021-03-09 12:00:00  50952.697141  225.633840
2021-03-09 16:00:00  51152.798741  200.101599
2021-03-09 20:00:00  51392.873289  240.074549
2021-03-10 00:00:00  51472.273233   79.399943
2021-03-10 04:00:00  51601.351944  129.078712
2021-03-10 08:00:00  51759.387477  158.035533
2021-03-10 12:00:00  52053.982892  294.595415
2021-03-10 16:00:00  52437.071119  383.088226
2021-03-10 20:00:00  52648.225156  211.154038

This works, but the problem is that the result of the slope depends a lot on the magnitude of the data i'm providing, which means that with lower prices i'm going to get much lower slope values, with higher values higher slope values, but since i'm performing some sort of analysis i need something more "universal" that would give me the inclination of the line i'm plotting without depending on the magnitude of the data i'm using. Is it possible? Any kind of advice is appreciated.

I am not sure about what you are trying to achieve, but find the slope and angle of a series of points can be done in the following manner.

Suppose your dataframe is given by:

   Date       measure
0   2021-02-26 08:00  51491.322786
1   2021-02-26 12:00  51373.462137
2   2021-02-26 16:00  51244.591670
3   2021-02-26 20:00  51061.134204
4   2021-02-27 00:00  50985.592434
..               ...           ...
71  2021-03-10 04:00  51601.351944
72  2021-03-10 08:00  51759.387477
73  2021-03-10 12:00  52053.982892
74  2021-03-10 16:00  52437.071119
75  2021-03-10 20:00  52648.225156

which is exactly what you've posted. Then, you can define a function slope_and_angle as

y = range(len(df['measure'])) ##Make sure you get the range of values

def slope_and_angle(df):
    for i in y:
        df['slope'] = (y[i-1] - y[1]) / (df['measure'].diff())
        df['angle'] = np.rad2deg(np.arctan2(y[i-1] - y[1], df['measure'].diff()))
    return df

which returns:

            Date       measure     slope       angle
0   2021-02-26 08:00  51491.322786       NaN         NaN
1   2021-02-26 12:00  51373.462137 -0.619376  148.226940
2   2021-02-26 16:00  51244.591670 -0.566460  150.470170
3   2021-02-26 20:00  51061.134204 -0.397912  158.301777
4   2021-02-27 00:00  50985.592434 -0.966353  135.980320
..               ...           ...       ...         ...
71  2021-03-10 04:00  51601.351944  0.565546   29.490174
72  2021-03-10 08:00  51759.387477  0.461921   24.793227
73  2021-03-10 12:00  52053.982892  0.247797   13.917410
74  2021-03-10 16:00  52437.071119  0.190557   10.788745
75  2021-03-10 20:00  52648.225156  0.345719   19.071249

What you returned in your ouput example was just df['measure'].diff() .

There are things you can do, but there is no universal thing that will always work well - you need to understand your data and choose something appropriate for your case.

For example, 100 random numbers between 0 and 1, sampled every 4 hours

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'timestamp': pd.date_range("2021-01-01", periods=100, freq="4H"),
    'value': np.random.random(100)
})
# df:
#       timestamp               value
# 0     2021-01-01 00:00:00     0.780008
# 1     2021-01-01 04:00:00     0.689576
# 2     2021-01-01 08:00:00     0.700937
# 3     2021-01-01 12:00:00     0.756724
# 4     2021-01-01 16:00:00     0.928890
# etc

We can calculate the gradient quite easily:

differences = df.diff()
gradient = 3600 * differences.value / differences.timestamp.dt.seconds
# gradient: max value 0.1979, min value -0.2432
# 0         NaN
# 1    0.033912
# 2    0.045422
# 3   -0.001827
# 4   -0.225796

The gradient is the change in value per hour, ignoring any wrinkles such as missing values, repeated time points etc.

Now, as you observe, if the magnitude of these numbers increases, the gradient increases. For example, if I make value 100 times bigger:

df['value100'] = 100 * df.value
differences = df.diff()
gradient = 3600 * differences.value100 / differences.timestamp.dt.seconds
print(gradient.max(), gradient.min())
# gradient: max value 19.79, min value -24.32
# 0          NaN
# 1     3.391221
# 2     4.542248
# 3    -0.182714
# 4   -22.579588

Here we see that the gradients are also 100 times bigger - exactly as would be expected.

This suggests that we could just divide by some number, but the question then becomes what number to use? This is where understanding your data is important.

One approach is to use the range of the data. This is similar to what you would see if you plotted a graph using matplotlib - the y scale would fit the maximum and minimum values. For example:

sf = df.value.max() - df.value.min()
sf100 = df.value100.max() - df.value100.min()
differences = df.diff()

gradient = differences.value / sf
gradient100 = differences.value100 / sf100

# gradient, gradient100
# nan,      nan
# 0.1379,   0.1379
# 0.1847,   0.1847
# -0.0074,  -0.0074
# -0.9184,  -0.9184

As you can see, the two gradients now match each other. This approach works well when there is a simple linear scaling.

However, consider a different case - one where the extra range comes about because of an outlier.

df['value_outlier'] = df.value
df.loc[50, 'value_outlier'] = 100  # Just set the 50th value to 100

sf = df.value.max() - df.value.min()
sf_outlier = df.value_outlier.max() - df.value_outlier.min()

differences = df.diff()
gradient = differences.value / sf
gradient_outlier = differences.value_outlier / sf_outlier

# gradient, gradient_outlier
# nan,      nan
# 0.1379,   0.0014
# 0.1847,   0.0018
# -0.0074,  -0.0001
# -0.9184,  -0.0090
# 0.5792,   0.0057

This doesn't look so good. The reason why is that we have inflated the range of value_outlier without changing the actual range between most of the points.

You can fix this - one approach is to use the interquartile range as the scale factor:

sf = df.value.quantile(0.75) - df.value.quantile(0.25)
sf100 = df.value100.quantile(0.75) - df.value100.quantile(0.25)
sf_outlier = df.value_outlier.quantile(0.75) - df.value_outlier.quantile(0.25)

differences = df.diff()
gradient = differences.value / sf
gradient100 = differences.value100 / sf100
gradient_outlier = differences.value_outlier / sf_outlier

for a, b, c in zip(gradient, gradient100, gradient_outlier):
    print(f'{a:.4f}, {b:.4f}, {c:.4f}')

# gradient, gradient100, gradient_outlier
# nan,      nan,         nan
# 0.2953,   0.2953,      0.3202
# 0.3956,   0.3956,      0.4289
# -0.0159,  -0.0159,     -0.0173
# -1.9665,  -1.9665,     -2.1320
# 1.2403,   1.2403,      1.3447

The values are never going to match perfectly, but they should be approximately the same. And, of course, you're going to have an enormous difference where that outlier is.

So, the key message is you can do something , but you need to make sure that it is an appropriate thing for your data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM