繁体   English   中英

Python - 我怎样才能找到一条线在给定点的角度?

[英]Python - how can i find the angle of a line at a given point?

我正在处理简单的 OHLC 时间序列数据,这是一个示例:

2021-02-26 08:00:00  51491.322786
2021-02-26 12:00:00  51373.462137
2021-02-26 16:00:00  51244.591670
2021-02-26 20:00:00  51061.134204
2021-02-27 00:00:00  50985.592434
2021-02-27 04:00:00  50923.287370
2021-02-27 08:00:00  50842.103282
2021-02-27 12:00:00  50695.160604
2021-02-27 16:00:00  50608.462150
2021-02-27 20:00:00  50455.235146
2021-02-28 00:00:00  50177.377531
2021-02-28 04:00:00  49936.652091
2021-02-28 08:00:00  49860.396537
2021-02-28 12:00:00  49651.901082
2021-02-28 16:00:00  49625.153441
2021-02-28 20:00:00  49570.275193
2021-03-01 00:00:00  49531.874272
2021-03-01 04:00:00  49510.381676
2021-03-01 08:00:00  49486.289712
2021-03-01 12:00:00  49481.496645
2021-03-01 16:00:00  49469.806692
2021-03-01 20:00:00  49471.958606
2021-03-02 00:00:00  49462.095568
2021-03-02 04:00:00  49453.473575
2021-03-02 08:00:00  49438.986536
2021-03-02 12:00:00  49409.492007
2021-03-02 16:00:00  49356.563396
2021-03-02 20:00:00  49331.037118
2021-03-03 00:00:00  49297.823947
2021-03-03 04:00:00  49322.049974
2021-03-03 08:00:00  49461.314013
2021-03-03 12:00:00  49515.137712
2021-03-03 16:00:00  49571.990877
2021-03-03 20:00:00  49592.320461
2021-03-04 00:00:00  49592.249409
2021-03-04 04:00:00  49593.938380
2021-03-04 08:00:00  49593.055971
2021-03-04 12:00:00  49592.025698
2021-03-04 16:00:00  49585.661437
2021-03-04 20:00:00  49578.693824
2021-03-05 00:00:00  49543.067346
2021-03-05 04:00:00  49540.706794
2021-03-05 08:00:00  49513.586831
2021-03-05 12:00:00  49494.990328
2021-03-05 16:00:00  49493.807248
2021-03-05 20:00:00  49461.133698
2021-03-06 00:00:00  49432.770930
2021-03-06 04:00:00  49412.087821
2021-03-06 08:00:00  49368.106499
2021-03-06 12:00:00  49290.581114
2021-03-06 16:00:00  49272.222740
2021-03-06 20:00:00  49269.814982
2021-03-07 00:00:00  49270.328825
2021-03-07 04:00:00  49293.664209
2021-03-07 08:00:00  49339.999430
2021-03-07 12:00:00  49404.798067
2021-03-07 16:00:00  49450.447631
2021-03-07 20:00:00  49528.402294
2021-03-08 00:00:00  49571.353158
2021-03-08 04:00:00  49572.687451
2021-03-08 08:00:00  49597.518988
2021-03-08 12:00:00  49648.407014
2021-03-08 16:00:00  49708.063384
2021-03-08 20:00:00  49862.237773
2021-03-09 00:00:00  50200.833030
2021-03-09 04:00:00  50446.201489
2021-03-09 08:00:00  50727.063301
2021-03-09 12:00:00  50952.697141
2021-03-09 16:00:00  51152.798741
2021-03-09 20:00:00  51392.873289
2021-03-10 00:00:00  51472.273233
2021-03-10 04:00:00  51601.351944
2021-03-10 08:00:00  51759.387477
2021-03-10 12:00:00  52053.982892
2021-03-10 16:00:00  52437.071119
2021-03-10 20:00:00  52648.225156

我试图找到一种方法来获得线在每个点的倾斜或陡峭程度。 基本上我只需要知道这条线是向上、向下还是横向以及上升了多少,所以理想的情况是得到某种系数或数字来告诉我这条线有多陡。

为了做到这一点,我有了计算斜率的想法,所以我尝试了从这里得到的以下代码:

def slope( close, length=None, as_angle=None, to_degrees=None, vertical=None, offset=None, **kwargs):
    """Indicator: Slope"""
    # Validate arguments
    length = int(length) if length and length > 0 else 1
    as_angle = True if isinstance(as_angle, bool) else False
    to_degrees = True if isinstance(to_degrees, bool) else False
    close = verify_series(close, length)
    offset = get_offset(offset)

    if close is None: return

    # Calculate Result
    slope = close.diff(length) / length
    if as_angle:
        slope = slope.apply(npAtan)
        if to_degrees:
            slope *= 180 / npPi

    # Offset
    if offset != 0:
        slope = slope.shift(offset)

    # Handle fills
    if "fillna" in kwargs:
        slope.fillna(kwargs["fillna"], inplace=True)
    if "fill_method" in kwargs:
        slope.fillna(method=kwargs["fill_method"], inplace=True)

    # Name and Categorize it
    slope.name = f"SLOPE_{length}" if not as_angle else f"ANGLE{'d' if to_degrees else 'r'}_{length}"
    slope.category = "momentum"

    return slope 

这是 output 的示例:

2021-02-26 08:00:00  51491.322786 -110.850644
2021-02-26 12:00:00  51373.462137 -117.860648
2021-02-26 16:00:00  51244.591670 -128.870468
2021-02-26 20:00:00  51061.134204 -183.457466
2021-02-27 00:00:00  50985.592434  -75.541770
2021-02-27 04:00:00  50923.287370  -62.305064
2021-02-27 08:00:00  50842.103282  -81.184088
2021-02-27 12:00:00  50695.160604 -146.942678
2021-02-27 16:00:00  50608.462150  -86.698454
2021-02-27 20:00:00  50455.235146 -153.227004
2021-02-28 00:00:00  50177.377531 -277.857615
2021-02-28 04:00:00  49936.652091 -240.725440
2021-02-28 08:00:00  49860.396537  -76.255553
2021-02-28 12:00:00  49651.901082 -208.495455
2021-02-28 16:00:00  49625.153441  -26.747641
2021-02-28 20:00:00  49570.275193  -54.878249
2021-03-01 00:00:00  49531.874272  -38.400921
2021-03-01 04:00:00  49510.381676  -21.492596
2021-03-01 08:00:00  49486.289712  -24.091964
2021-03-01 12:00:00  49481.496645   -4.793067
2021-03-01 16:00:00  49469.806692  -11.689953
2021-03-01 20:00:00  49471.958606    2.151914
2021-03-02 00:00:00  49462.095568   -9.863038
2021-03-02 04:00:00  49453.473575   -8.621994
2021-03-02 08:00:00  49438.986536  -14.487039
2021-03-02 12:00:00  49409.492007  -29.494528
2021-03-02 16:00:00  49356.563396  -52.928611
2021-03-02 20:00:00  49331.037118  -25.526278
2021-03-03 00:00:00  49297.823947  -33.213171
2021-03-03 04:00:00  49322.049974   24.226027
2021-03-03 08:00:00  49461.314013  139.264040
2021-03-03 12:00:00  49515.137712   53.823699
2021-03-03 16:00:00  49571.990877   56.853165
2021-03-03 20:00:00  49592.320461   20.329584
2021-03-04 00:00:00  49592.249409   -0.071052
2021-03-04 04:00:00  49593.938380    1.688971
2021-03-04 08:00:00  49593.055971   -0.882409
2021-03-04 12:00:00  49592.025698   -1.030273
2021-03-04 16:00:00  49585.661437   -6.364260
2021-03-04 20:00:00  49578.693824   -6.967614
2021-03-05 00:00:00  49543.067346  -35.626478
2021-03-05 04:00:00  49540.706794   -2.360551
2021-03-05 08:00:00  49513.586831  -27.119963
2021-03-05 12:00:00  49494.990328  -18.596504
2021-03-05 16:00:00  49493.807248   -1.183080
2021-03-05 20:00:00  49461.133698  -32.673550
2021-03-06 00:00:00  49432.770930  -28.362769
2021-03-06 04:00:00  49412.087821  -20.683109
2021-03-06 08:00:00  49368.106499  -43.981322
2021-03-06 12:00:00  49290.581114  -77.525385
2021-03-06 16:00:00  49272.222740  -18.358373
2021-03-06 20:00:00  49269.814982   -2.407758
2021-03-07 00:00:00  49270.328825    0.513843
2021-03-07 04:00:00  49293.664209   23.335384
2021-03-07 08:00:00  49339.999430   46.335221
2021-03-07 12:00:00  49404.798067   64.798637
2021-03-07 16:00:00  49450.447631   45.649564
2021-03-07 20:00:00  49528.402294   77.954663
2021-03-08 00:00:00  49571.353158   42.950863
2021-03-08 04:00:00  49572.687451    1.334294
2021-03-08 08:00:00  49597.518988   24.831537
2021-03-08 12:00:00  49648.407014   50.888026
2021-03-08 16:00:00  49708.063384   59.656369
2021-03-08 20:00:00  49862.237773  154.174389
2021-03-09 00:00:00  50200.833030  338.595257
2021-03-09 04:00:00  50446.201489  245.368460
2021-03-09 08:00:00  50727.063301  280.861811
2021-03-09 12:00:00  50952.697141  225.633840
2021-03-09 16:00:00  51152.798741  200.101599
2021-03-09 20:00:00  51392.873289  240.074549
2021-03-10 00:00:00  51472.273233   79.399943
2021-03-10 04:00:00  51601.351944  129.078712
2021-03-10 08:00:00  51759.387477  158.035533
2021-03-10 12:00:00  52053.982892  294.595415
2021-03-10 16:00:00  52437.071119  383.088226
2021-03-10 20:00:00  52648.225156  211.154038

这行得通,但问题是斜率的结果很大程度上取决于我提供的数据的大小,这意味着价格越低,我将获得更低的斜率值,更高的值斜率值越高,但由于我正在执行某种分析,因此我需要一些更“通用”的东西,它可以让我知道我正在绘制的线的倾斜度,而不取决于我正在使用的数据的大小。 可能吗? 任何形式的建议表示赞赏。

我不确定您要达到什么目标,但是可以通过以下方式找到一系列点的斜率和角度。

假设您的 dataframe 由下式给出:

   Date       measure
0   2021-02-26 08:00  51491.322786
1   2021-02-26 12:00  51373.462137
2   2021-02-26 16:00  51244.591670
3   2021-02-26 20:00  51061.134204
4   2021-02-27 00:00  50985.592434
..               ...           ...
71  2021-03-10 04:00  51601.351944
72  2021-03-10 08:00  51759.387477
73  2021-03-10 12:00  52053.982892
74  2021-03-10 16:00  52437.071119
75  2021-03-10 20:00  52648.225156

这正是您发布的内容。 然后,您可以将 function slope_and_angle定义为

y = range(len(df['measure'])) ##Make sure you get the range of values

def slope_and_angle(df):
    for i in y:
        df['slope'] = (y[i-1] - y[1]) / (df['measure'].diff())
        df['angle'] = np.rad2deg(np.arctan2(y[i-1] - y[1], df['measure'].diff()))
    return df

返回:

            Date       measure     slope       angle
0   2021-02-26 08:00  51491.322786       NaN         NaN
1   2021-02-26 12:00  51373.462137 -0.619376  148.226940
2   2021-02-26 16:00  51244.591670 -0.566460  150.470170
3   2021-02-26 20:00  51061.134204 -0.397912  158.301777
4   2021-02-27 00:00  50985.592434 -0.966353  135.980320
..               ...           ...       ...         ...
71  2021-03-10 04:00  51601.351944  0.565546   29.490174
72  2021-03-10 08:00  51759.387477  0.461921   24.793227
73  2021-03-10 12:00  52053.982892  0.247797   13.917410
74  2021-03-10 16:00  52437.071119  0.190557   10.788745
75  2021-03-10 20:00  52648.225156  0.345719   19.071249

您在输出示例中返回的只是df['measure'].diff()

有些事情你可以做,但没有通用的事情总是能很好地工作——你需要了解你的数据并选择适合你情况的东西。

例如,100 个 0 到 1 之间的随机数,每 4 小时采样一次

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'timestamp': pd.date_range("2021-01-01", periods=100, freq="4H"),
    'value': np.random.random(100)
})
# df:
#       timestamp               value
# 0     2021-01-01 00:00:00     0.780008
# 1     2021-01-01 04:00:00     0.689576
# 2     2021-01-01 08:00:00     0.700937
# 3     2021-01-01 12:00:00     0.756724
# 4     2021-01-01 16:00:00     0.928890
# etc

我们可以很容易地计算梯度:

differences = df.diff()
gradient = 3600 * differences.value / differences.timestamp.dt.seconds
# gradient: max value 0.1979, min value -0.2432
# 0         NaN
# 1    0.033912
# 2    0.045422
# 3   -0.001827
# 4   -0.225796

梯度是每小时值的变化,忽略任何皱纹,如缺失值、重复时间点等。

现在,正如您所观察到的,如果这些数字的大小增加,梯度就会增加。 例如,如果我将value放大 100 倍:

df['value100'] = 100 * df.value
differences = df.diff()
gradient = 3600 * differences.value100 / differences.timestamp.dt.seconds
print(gradient.max(), gradient.min())
# gradient: max value 19.79, min value -24.32
# 0          NaN
# 1     3.391221
# 2     4.542248
# 3    -0.182714
# 4   -22.579588

在这里,我们看到梯度也大了 100 倍——正如预期的那样。

这表明我们可以只除以某个数字,但问题就变成了使用什么数字? 这就是理解数据很重要的地方。

一种方法是使用数据的范围。 这类似于您使用matplotlib绘制图表时所看到的 - y比例将适合最大值和最小值。 例如:

sf = df.value.max() - df.value.min()
sf100 = df.value100.max() - df.value100.min()
differences = df.diff()

gradient = differences.value / sf
gradient100 = differences.value100 / sf100

# gradient, gradient100
# nan,      nan
# 0.1379,   0.1379
# 0.1847,   0.1847
# -0.0074,  -0.0074
# -0.9184,  -0.9184

如您所见,两个渐变现在相互匹配。 当存在简单的线性缩放时,这种方法效果很好。

但是,请考虑另一种情况 - 由于异常值而产生额外范围的情况。

df['value_outlier'] = df.value
df.loc[50, 'value_outlier'] = 100  # Just set the 50th value to 100

sf = df.value.max() - df.value.min()
sf_outlier = df.value_outlier.max() - df.value_outlier.min()

differences = df.diff()
gradient = differences.value / sf
gradient_outlier = differences.value_outlier / sf_outlier

# gradient, gradient_outlier
# nan,      nan
# 0.1379,   0.0014
# 0.1847,   0.0018
# -0.0074,  -0.0001
# -0.9184,  -0.0090
# 0.5792,   0.0057

这看起来不太好。 原因是我们在没有改变大多数点之间的实际范围的情况下夸大了value_outlier的范围。

您可以解决此问题 - 一种方法是使用四分位数范围作为比例因子:

sf = df.value.quantile(0.75) - df.value.quantile(0.25)
sf100 = df.value100.quantile(0.75) - df.value100.quantile(0.25)
sf_outlier = df.value_outlier.quantile(0.75) - df.value_outlier.quantile(0.25)

differences = df.diff()
gradient = differences.value / sf
gradient100 = differences.value100 / sf100
gradient_outlier = differences.value_outlier / sf_outlier

for a, b, c in zip(gradient, gradient100, gradient_outlier):
    print(f'{a:.4f}, {b:.4f}, {c:.4f}')

# gradient, gradient100, gradient_outlier
# nan,      nan,         nan
# 0.2953,   0.2953,      0.3202
# 0.3956,   0.3956,      0.4289
# -0.0159,  -0.0159,     -0.0173
# -1.9665,  -1.9665,     -2.1320
# 1.2403,   1.2403,      1.3447

这些值永远不会完美匹配,但它们应该大致相同。 而且,当然,离群值所在的位置会有很大的不同。

因此,关键信息是您可以做某事,但您需要确保它适合您的数据。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM