简体   繁体   English

计算和可视化两个无序序列中的变量之间的相关性

[英]Calculating and visualizing correlation between 2 variables which are in an unordered series

As a part of my final year research implementation, I'm trying to calculate and visualize the correlation between two variables which are not in a ordered series. 作为我最后一年研究工作的一部分,我试图计算和可视化两个不在有序序列中的变量之间的相关性。 In a dataset such as follows, 在如下数据集中,

DateAndTime           Demand    Temperature
2015-01-02 18:00:00    2081         41
2015-01-02 19:00:00    2370         42
2015-01-02 20:00:00    2048         42
2015-01-02 21:00:00    1806         42
2015-01-02 22:00:00    1818         41
2015-01-02 23:00:00    1918         40
2015-01-03 00:00:00    1685         40
2015-01-03 01:00:00    1263         38
2015-01-03 02:00:00     969         38
2015-01-03 03:00:00     763         37
2015-01-03 04:00:00     622         36

Calculating and visualizing the correlation between the Date and Demand is straightforward since they are in an ordered series and a scatterplot can be used to easily visualize their correlation. 由于日期和需求之间是有序序列,因此计算和可视化日期和需求之间的相关性非常简单,并且可以使用散点图轻松地显示其相关性。 However, if I were to calculate the correlation between the Temperature and Demand the resulting scatterplot does not make much sense as it's not in any mathematical order. 但是,如果我要计算温度和需求之间的相关性,则得到的散点图就没有多大意义,因为它没有任何数学顺序。 What approach should be used to visualize the correlation between these 2 variables in a more meaningful manner. 应该使用哪种方法以更有意义的方式可视化这两个变量之间的相关性。 I'm using basic python frameworks such as Matplotlib, Statsmodels and Sklearn for this. 我为此使用了基本的Python框架,例如Matplotlib,Statsmodels和Sklearn。

Okay so the idea is to plot both columns, one in the x-axis and the other in the y-axis, and try to make a line that simulates its behaviour. 好的,这样的想法是绘制两个列,一个列在x轴上,另一个列在y轴上,并尝试绘制一条模拟其行为的线。 Numpy has a function to compute the line so Numpy具有计算线的功能,因此

import numpy as np
import matplotlib.pyplot as plt

x = [4,2,1,5]
y = [2,4,6,3]

fit = np.polyfit(x,y,1)
fit_line = np.poly1d(fit)

plt.figure()
plt.plot(x,y,'rx')
plt.plot(x,fit_line(x),'--b')
plt.show()

在此处输入图片说明

And if we consider the regression line to be y = a*x + b , you can obtain the coefficient a and b so that 如果我们将回归线设为y = a*x + b ,则可以获得系数a和b,从而

a = fit[0]
b = fit[1]

which returns 哪个返回

a = -0.8000000000000005
b = 6.150000000000002

Just use your x and y 只需使用您的x和y

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM