[英]Correlation matrix returning NaN values from Pandas DataFrame
I have a couple of large datasets that I need to find the correlation between. 我需要几个大型数据集来查找它们之间的相关性。 The data is converted into a panda dataframe and I use pd.DataFrame.corr() to find the correlation.
数据被转换为熊猫数据框,我使用pd.DataFrame.corr()查找相关性。 It works for some datasets and not for others, and I am unsure why.
它适用于某些数据集而不适用于其他数据集,我不确定为什么。
Values in the datasets that do not work are not the same, so the SD is not 0. The column types (dtype) of the dataFrame objects are all float64. 无效的数据集中的值不相同,因此SD不为0。dataFrame对象的列类型(dtype)均为float64。
The code works with: 该代码适用于:
BPM1401-01:x BPM1401-01:y
2019-07-23 05:59:59.641471863 0.000052 -0.000108
2019-07-23 06:00:00.033471822 0.000050 -0.000108
2019-07-23 06:00:00.425471783 NaN -0.000108
2019-07-23 06:00:00.816471815 0.000051 NaN
2019-07-23 06:00:01.170471907 0.000050 NaN
2019-07-23 06:00:01.954471827 0.000049 NaN
2019-07-23 06:00:02.345471859 0.000051 NaN
2019-07-23 06:00:02.737471819 0.000051 -0.000108
2019-07-23 06:00:03.090471745 0.000052 -0.000108
2019-07-23 06:00:03.481471777 0.000051 -0.000109
but does not work with: 但不适用于:
SR1:BPMXRMSGlobal SR1:BPMYRMSGlobal
2019-07-23 05:59:58.197318077 1.096721 NaN
2019-07-23 05:59:58.197477102 NaN 1.586067
2019-07-23 06:00:01.471035957 NaN 0.772168
2019-07-23 06:00:02.132909060 1.553643 NaN
2019-07-23 06:00:02.132987022 NaN 1.209081
2019-07-23 06:00:02.793922901 2.558707 NaN
2019-07-23 06:00:02.793971062 NaN 1.624215
2019-07-23 06:00:03.440277100 2.508732 NaN
2019-07-23 06:00:03.440378904 NaN 1.540483
2019-07-23 06:00:04.094022036 2.325517 NaN
import pandas as pd
import seaborn as sb
import numpy as np
#Align the data using the timestamps, already done in the above sets.
def align_dataframes(data_frame_list):
#Set progress to initial dataframe
curr_df = data_frame_list[0]
#Align all dataframes together and join
for i in range(len(data_frame_list)-1):
curr_df = curr_df.join(data_frame_list[i+1], how = 'outer')
#Return aligned dataframe
return(curr_df)
def plot_corr(data_frame):
print(data_frame.dtypes) -> gives float64
#Compute correlation matrix
corr_mat = data_frame.corr(method = 'pearson',min_periods=1)
heat_map = sb.heatmap(corr_mat, linewidths = .5)
plt.show()
It seems to me like the second dataFrame should work just as well, but the corr() matrix ends up returning NaN values. 在我看来,第二个dataFrame应该也能正常工作,但是corr()矩阵最终返回NaN值。
第二个数据帧没有行,两个值都不都不为空,因此没有数据点可在其上找到相关性
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.