[英]Python - how can i find the angle of a line at a given point?
I'm dealing with simple OHLC time series data, here is a sample:我正在处理简单的 OHLC 时间序列数据,这是一个示例:
2021-02-26 08:00:00 51491.322786
2021-02-26 12:00:00 51373.462137
2021-02-26 16:00:00 51244.591670
2021-02-26 20:00:00 51061.134204
2021-02-27 00:00:00 50985.592434
2021-02-27 04:00:00 50923.287370
2021-02-27 08:00:00 50842.103282
2021-02-27 12:00:00 50695.160604
2021-02-27 16:00:00 50608.462150
2021-02-27 20:00:00 50455.235146
2021-02-28 00:00:00 50177.377531
2021-02-28 04:00:00 49936.652091
2021-02-28 08:00:00 49860.396537
2021-02-28 12:00:00 49651.901082
2021-02-28 16:00:00 49625.153441
2021-02-28 20:00:00 49570.275193
2021-03-01 00:00:00 49531.874272
2021-03-01 04:00:00 49510.381676
2021-03-01 08:00:00 49486.289712
2021-03-01 12:00:00 49481.496645
2021-03-01 16:00:00 49469.806692
2021-03-01 20:00:00 49471.958606
2021-03-02 00:00:00 49462.095568
2021-03-02 04:00:00 49453.473575
2021-03-02 08:00:00 49438.986536
2021-03-02 12:00:00 49409.492007
2021-03-02 16:00:00 49356.563396
2021-03-02 20:00:00 49331.037118
2021-03-03 00:00:00 49297.823947
2021-03-03 04:00:00 49322.049974
2021-03-03 08:00:00 49461.314013
2021-03-03 12:00:00 49515.137712
2021-03-03 16:00:00 49571.990877
2021-03-03 20:00:00 49592.320461
2021-03-04 00:00:00 49592.249409
2021-03-04 04:00:00 49593.938380
2021-03-04 08:00:00 49593.055971
2021-03-04 12:00:00 49592.025698
2021-03-04 16:00:00 49585.661437
2021-03-04 20:00:00 49578.693824
2021-03-05 00:00:00 49543.067346
2021-03-05 04:00:00 49540.706794
2021-03-05 08:00:00 49513.586831
2021-03-05 12:00:00 49494.990328
2021-03-05 16:00:00 49493.807248
2021-03-05 20:00:00 49461.133698
2021-03-06 00:00:00 49432.770930
2021-03-06 04:00:00 49412.087821
2021-03-06 08:00:00 49368.106499
2021-03-06 12:00:00 49290.581114
2021-03-06 16:00:00 49272.222740
2021-03-06 20:00:00 49269.814982
2021-03-07 00:00:00 49270.328825
2021-03-07 04:00:00 49293.664209
2021-03-07 08:00:00 49339.999430
2021-03-07 12:00:00 49404.798067
2021-03-07 16:00:00 49450.447631
2021-03-07 20:00:00 49528.402294
2021-03-08 00:00:00 49571.353158
2021-03-08 04:00:00 49572.687451
2021-03-08 08:00:00 49597.518988
2021-03-08 12:00:00 49648.407014
2021-03-08 16:00:00 49708.063384
2021-03-08 20:00:00 49862.237773
2021-03-09 00:00:00 50200.833030
2021-03-09 04:00:00 50446.201489
2021-03-09 08:00:00 50727.063301
2021-03-09 12:00:00 50952.697141
2021-03-09 16:00:00 51152.798741
2021-03-09 20:00:00 51392.873289
2021-03-10 00:00:00 51472.273233
2021-03-10 04:00:00 51601.351944
2021-03-10 08:00:00 51759.387477
2021-03-10 12:00:00 52053.982892
2021-03-10 16:00:00 52437.071119
2021-03-10 20:00:00 52648.225156
I'm trying to find a way to get how inclined or steep the line is at each point.我试图找到一种方法来获得线在每个点的倾斜或陡峭程度。 Basically i only need to know if the line is going up, down or sideways and by how much, so the ideal would be to get some sort of coefficient or number that tells me how steep the line is.
基本上我只需要知道这条线是向上、向下还是横向以及上升了多少,所以理想的情况是得到某种系数或数字来告诉我这条线有多陡。
In order to do that, i had the idea of calculating the slope, so i tried the following code that i got from here :为了做到这一点,我有了计算斜率的想法,所以我尝试了从这里得到的以下代码:
def slope( close, length=None, as_angle=None, to_degrees=None, vertical=None, offset=None, **kwargs):
"""Indicator: Slope"""
# Validate arguments
length = int(length) if length and length > 0 else 1
as_angle = True if isinstance(as_angle, bool) else False
to_degrees = True if isinstance(to_degrees, bool) else False
close = verify_series(close, length)
offset = get_offset(offset)
if close is None: return
# Calculate Result
slope = close.diff(length) / length
if as_angle:
slope = slope.apply(npAtan)
if to_degrees:
slope *= 180 / npPi
# Offset
if offset != 0:
slope = slope.shift(offset)
# Handle fills
if "fillna" in kwargs:
slope.fillna(kwargs["fillna"], inplace=True)
if "fill_method" in kwargs:
slope.fillna(method=kwargs["fill_method"], inplace=True)
# Name and Categorize it
slope.name = f"SLOPE_{length}" if not as_angle else f"ANGLE{'d' if to_degrees else 'r'}_{length}"
slope.category = "momentum"
return slope
Here is a sample of the output:这是 output 的示例:
2021-02-26 08:00:00 51491.322786 -110.850644
2021-02-26 12:00:00 51373.462137 -117.860648
2021-02-26 16:00:00 51244.591670 -128.870468
2021-02-26 20:00:00 51061.134204 -183.457466
2021-02-27 00:00:00 50985.592434 -75.541770
2021-02-27 04:00:00 50923.287370 -62.305064
2021-02-27 08:00:00 50842.103282 -81.184088
2021-02-27 12:00:00 50695.160604 -146.942678
2021-02-27 16:00:00 50608.462150 -86.698454
2021-02-27 20:00:00 50455.235146 -153.227004
2021-02-28 00:00:00 50177.377531 -277.857615
2021-02-28 04:00:00 49936.652091 -240.725440
2021-02-28 08:00:00 49860.396537 -76.255553
2021-02-28 12:00:00 49651.901082 -208.495455
2021-02-28 16:00:00 49625.153441 -26.747641
2021-02-28 20:00:00 49570.275193 -54.878249
2021-03-01 00:00:00 49531.874272 -38.400921
2021-03-01 04:00:00 49510.381676 -21.492596
2021-03-01 08:00:00 49486.289712 -24.091964
2021-03-01 12:00:00 49481.496645 -4.793067
2021-03-01 16:00:00 49469.806692 -11.689953
2021-03-01 20:00:00 49471.958606 2.151914
2021-03-02 00:00:00 49462.095568 -9.863038
2021-03-02 04:00:00 49453.473575 -8.621994
2021-03-02 08:00:00 49438.986536 -14.487039
2021-03-02 12:00:00 49409.492007 -29.494528
2021-03-02 16:00:00 49356.563396 -52.928611
2021-03-02 20:00:00 49331.037118 -25.526278
2021-03-03 00:00:00 49297.823947 -33.213171
2021-03-03 04:00:00 49322.049974 24.226027
2021-03-03 08:00:00 49461.314013 139.264040
2021-03-03 12:00:00 49515.137712 53.823699
2021-03-03 16:00:00 49571.990877 56.853165
2021-03-03 20:00:00 49592.320461 20.329584
2021-03-04 00:00:00 49592.249409 -0.071052
2021-03-04 04:00:00 49593.938380 1.688971
2021-03-04 08:00:00 49593.055971 -0.882409
2021-03-04 12:00:00 49592.025698 -1.030273
2021-03-04 16:00:00 49585.661437 -6.364260
2021-03-04 20:00:00 49578.693824 -6.967614
2021-03-05 00:00:00 49543.067346 -35.626478
2021-03-05 04:00:00 49540.706794 -2.360551
2021-03-05 08:00:00 49513.586831 -27.119963
2021-03-05 12:00:00 49494.990328 -18.596504
2021-03-05 16:00:00 49493.807248 -1.183080
2021-03-05 20:00:00 49461.133698 -32.673550
2021-03-06 00:00:00 49432.770930 -28.362769
2021-03-06 04:00:00 49412.087821 -20.683109
2021-03-06 08:00:00 49368.106499 -43.981322
2021-03-06 12:00:00 49290.581114 -77.525385
2021-03-06 16:00:00 49272.222740 -18.358373
2021-03-06 20:00:00 49269.814982 -2.407758
2021-03-07 00:00:00 49270.328825 0.513843
2021-03-07 04:00:00 49293.664209 23.335384
2021-03-07 08:00:00 49339.999430 46.335221
2021-03-07 12:00:00 49404.798067 64.798637
2021-03-07 16:00:00 49450.447631 45.649564
2021-03-07 20:00:00 49528.402294 77.954663
2021-03-08 00:00:00 49571.353158 42.950863
2021-03-08 04:00:00 49572.687451 1.334294
2021-03-08 08:00:00 49597.518988 24.831537
2021-03-08 12:00:00 49648.407014 50.888026
2021-03-08 16:00:00 49708.063384 59.656369
2021-03-08 20:00:00 49862.237773 154.174389
2021-03-09 00:00:00 50200.833030 338.595257
2021-03-09 04:00:00 50446.201489 245.368460
2021-03-09 08:00:00 50727.063301 280.861811
2021-03-09 12:00:00 50952.697141 225.633840
2021-03-09 16:00:00 51152.798741 200.101599
2021-03-09 20:00:00 51392.873289 240.074549
2021-03-10 00:00:00 51472.273233 79.399943
2021-03-10 04:00:00 51601.351944 129.078712
2021-03-10 08:00:00 51759.387477 158.035533
2021-03-10 12:00:00 52053.982892 294.595415
2021-03-10 16:00:00 52437.071119 383.088226
2021-03-10 20:00:00 52648.225156 211.154038
This works, but the problem is that the result of the slope depends a lot on the magnitude of the data i'm providing, which means that with lower prices i'm going to get much lower slope values, with higher values higher slope values, but since i'm performing some sort of analysis i need something more "universal" that would give me the inclination of the line i'm plotting without depending on the magnitude of the data i'm using.这行得通,但问题是斜率的结果很大程度上取决于我提供的数据的大小,这意味着价格越低,我将获得更低的斜率值,更高的值斜率值越高,但由于我正在执行某种分析,因此我需要一些更“通用”的东西,它可以让我知道我正在绘制的线的倾斜度,而不取决于我正在使用的数据的大小。 Is it possible?
可能吗? Any kind of advice is appreciated.
任何形式的建议表示赞赏。
I am not sure about what you are trying to achieve, but find the slope and angle of a series of points can be done in the following manner.我不确定您要达到什么目标,但是可以通过以下方式找到一系列点的斜率和角度。
Suppose your dataframe is given by:假设您的 dataframe 由下式给出:
Date measure
0 2021-02-26 08:00 51491.322786
1 2021-02-26 12:00 51373.462137
2 2021-02-26 16:00 51244.591670
3 2021-02-26 20:00 51061.134204
4 2021-02-27 00:00 50985.592434
.. ... ...
71 2021-03-10 04:00 51601.351944
72 2021-03-10 08:00 51759.387477
73 2021-03-10 12:00 52053.982892
74 2021-03-10 16:00 52437.071119
75 2021-03-10 20:00 52648.225156
which is exactly what you've posted.这正是您发布的内容。 Then, you can define a function
slope_and_angle
as然后,您可以将 function
slope_and_angle
定义为
y = range(len(df['measure'])) ##Make sure you get the range of values
def slope_and_angle(df):
for i in y:
df['slope'] = (y[i-1] - y[1]) / (df['measure'].diff())
df['angle'] = np.rad2deg(np.arctan2(y[i-1] - y[1], df['measure'].diff()))
return df
which returns:返回:
Date measure slope angle
0 2021-02-26 08:00 51491.322786 NaN NaN
1 2021-02-26 12:00 51373.462137 -0.619376 148.226940
2 2021-02-26 16:00 51244.591670 -0.566460 150.470170
3 2021-02-26 20:00 51061.134204 -0.397912 158.301777
4 2021-02-27 00:00 50985.592434 -0.966353 135.980320
.. ... ... ... ...
71 2021-03-10 04:00 51601.351944 0.565546 29.490174
72 2021-03-10 08:00 51759.387477 0.461921 24.793227
73 2021-03-10 12:00 52053.982892 0.247797 13.917410
74 2021-03-10 16:00 52437.071119 0.190557 10.788745
75 2021-03-10 20:00 52648.225156 0.345719 19.071249
What you returned in your ouput example was just df['measure'].diff()
.您在输出示例中返回的只是
df['measure'].diff()
。
There are things you can do, but there is no universal thing that will always work well - you need to understand your data and choose something appropriate for your case.有些事情你可以做,但没有通用的事情总是能很好地工作——你需要了解你的数据并选择适合你情况的东西。
For example, 100 random numbers between 0 and 1, sampled every 4 hours例如,100 个 0 到 1 之间的随机数,每 4 小时采样一次
import numpy as np
import pandas as pd
df = pd.DataFrame({
'timestamp': pd.date_range("2021-01-01", periods=100, freq="4H"),
'value': np.random.random(100)
})
# df:
# timestamp value
# 0 2021-01-01 00:00:00 0.780008
# 1 2021-01-01 04:00:00 0.689576
# 2 2021-01-01 08:00:00 0.700937
# 3 2021-01-01 12:00:00 0.756724
# 4 2021-01-01 16:00:00 0.928890
# etc
We can calculate the gradient quite easily:我们可以很容易地计算梯度:
differences = df.diff()
gradient = 3600 * differences.value / differences.timestamp.dt.seconds
# gradient: max value 0.1979, min value -0.2432
# 0 NaN
# 1 0.033912
# 2 0.045422
# 3 -0.001827
# 4 -0.225796
The gradient is the change in value per hour, ignoring any wrinkles such as missing values, repeated time points etc.梯度是每小时值的变化,忽略任何皱纹,如缺失值、重复时间点等。
Now, as you observe, if the magnitude of these numbers increases, the gradient increases.现在,正如您所观察到的,如果这些数字的大小增加,梯度就会增加。 For example, if I make
value
100 times bigger:例如,如果我将
value
放大 100 倍:
df['value100'] = 100 * df.value
differences = df.diff()
gradient = 3600 * differences.value100 / differences.timestamp.dt.seconds
print(gradient.max(), gradient.min())
# gradient: max value 19.79, min value -24.32
# 0 NaN
# 1 3.391221
# 2 4.542248
# 3 -0.182714
# 4 -22.579588
Here we see that the gradients are also 100 times bigger - exactly as would be expected.在这里,我们看到梯度也大了 100 倍——正如预期的那样。
This suggests that we could just divide by some number, but the question then becomes what number to use?这表明我们可以只除以某个数字,但问题就变成了使用什么数字? This is where understanding your data is important.
这就是理解数据很重要的地方。
One approach is to use the range of the data.一种方法是使用数据的范围。 This is similar to what you would see if you plotted a graph using
matplotlib
- the y
scale would fit the maximum and minimum values.这类似于您使用
matplotlib
绘制图表时所看到的 - y
比例将适合最大值和最小值。 For example:例如:
sf = df.value.max() - df.value.min()
sf100 = df.value100.max() - df.value100.min()
differences = df.diff()
gradient = differences.value / sf
gradient100 = differences.value100 / sf100
# gradient, gradient100
# nan, nan
# 0.1379, 0.1379
# 0.1847, 0.1847
# -0.0074, -0.0074
# -0.9184, -0.9184
As you can see, the two gradients now match each other.如您所见,两个渐变现在相互匹配。 This approach works well when there is a simple linear scaling.
当存在简单的线性缩放时,这种方法效果很好。
However, consider a different case - one where the extra range comes about because of an outlier.但是,请考虑另一种情况 - 由于异常值而产生额外范围的情况。
df['value_outlier'] = df.value
df.loc[50, 'value_outlier'] = 100 # Just set the 50th value to 100
sf = df.value.max() - df.value.min()
sf_outlier = df.value_outlier.max() - df.value_outlier.min()
differences = df.diff()
gradient = differences.value / sf
gradient_outlier = differences.value_outlier / sf_outlier
# gradient, gradient_outlier
# nan, nan
# 0.1379, 0.0014
# 0.1847, 0.0018
# -0.0074, -0.0001
# -0.9184, -0.0090
# 0.5792, 0.0057
This doesn't look so good.这看起来不太好。 The reason why is that we have inflated the range of
value_outlier
without changing the actual range between most of the points.原因是我们在没有改变大多数点之间的实际范围的情况下夸大了
value_outlier
的范围。
You can fix this - one approach is to use the interquartile range as the scale factor:您可以解决此问题 - 一种方法是使用四分位数范围作为比例因子:
sf = df.value.quantile(0.75) - df.value.quantile(0.25)
sf100 = df.value100.quantile(0.75) - df.value100.quantile(0.25)
sf_outlier = df.value_outlier.quantile(0.75) - df.value_outlier.quantile(0.25)
differences = df.diff()
gradient = differences.value / sf
gradient100 = differences.value100 / sf100
gradient_outlier = differences.value_outlier / sf_outlier
for a, b, c in zip(gradient, gradient100, gradient_outlier):
print(f'{a:.4f}, {b:.4f}, {c:.4f}')
# gradient, gradient100, gradient_outlier
# nan, nan, nan
# 0.2953, 0.2953, 0.3202
# 0.3956, 0.3956, 0.4289
# -0.0159, -0.0159, -0.0173
# -1.9665, -1.9665, -2.1320
# 1.2403, 1.2403, 1.3447
The values are never going to match perfectly, but they should be approximately the same.这些值永远不会完美匹配,但它们应该大致相同。 And, of course, you're going to have an enormous difference where that outlier is.
而且,当然,离群值所在的位置会有很大的不同。
So, the key message is you can do something , but you need to make sure that it is an appropriate thing for your data.因此,关键信息是您可以做某事,但您需要确保它适合您的数据。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.