简体   繁体   English

从Pandas DataFrame返回NaN值的相关矩阵

[英]Correlation matrix returning NaN values from Pandas DataFrame

I have a couple of large datasets that I need to find the correlation between. 我需要几个大型数据集来查找它们之间的相关性。 The data is converted into a panda dataframe and I use pd.DataFrame.corr() to find the correlation. 数据被转换为熊猫数据框,我使用pd.DataFrame.corr()查找相关性。 It works for some datasets and not for others, and I am unsure why. 它适用于某些数据集而不适用于其他数据集,我不确定为什么。

Values in the datasets that do not work are not the same, so the SD is not 0. The column types (dtype) of the dataFrame objects are all float64. 无效的数据集中的值不相同,因此SD不为0。dataFrame对象的列类型(dtype)均为float64。

The code works with: 该代码适用于:

                               BPM1401-01:x  BPM1401-01:y
2019-07-23 05:59:59.641471863      0.000052     -0.000108  
2019-07-23 06:00:00.033471822      0.000050     -0.000108  
2019-07-23 06:00:00.425471783           NaN     -0.000108  
2019-07-23 06:00:00.816471815      0.000051           NaN  
2019-07-23 06:00:01.170471907      0.000050           NaN  
2019-07-23 06:00:01.954471827      0.000049           NaN  
2019-07-23 06:00:02.345471859      0.000051           NaN  
2019-07-23 06:00:02.737471819      0.000051     -0.000108  
2019-07-23 06:00:03.090471745      0.000052     -0.000108  
2019-07-23 06:00:03.481471777      0.000051     -0.000109  

but does not work with: 但不适用于:

                               SR1:BPMXRMSGlobal  SR1:BPMYRMSGlobal
2019-07-23 05:59:58.197318077           1.096721                NaN  
2019-07-23 05:59:58.197477102                NaN           1.586067  
2019-07-23 06:00:01.471035957                NaN           0.772168  
2019-07-23 06:00:02.132909060           1.553643                NaN  
2019-07-23 06:00:02.132987022                NaN           1.209081  
2019-07-23 06:00:02.793922901           2.558707                NaN  
2019-07-23 06:00:02.793971062                NaN           1.624215  
2019-07-23 06:00:03.440277100           2.508732                NaN  
2019-07-23 06:00:03.440378904                NaN           1.540483  
2019-07-23 06:00:04.094022036           2.325517                NaN
import pandas as pd  
import seaborn as sb  
import numpy as np  

#Align the data using the timestamps, already done in the above sets.
def align_dataframes(data_frame_list):

    #Set progress to initial dataframe
    curr_df = data_frame_list[0]

    #Align all dataframes together and join
    for i in range(len(data_frame_list)-1):
        curr_df = curr_df.join(data_frame_list[i+1], how = 'outer')

    #Return aligned dataframe
    return(curr_df)

def plot_corr(data_frame):

    print(data_frame.dtypes) -> gives float64
    #Compute correlation matrix
    corr_mat = data_frame.corr(method = 'pearson',min_periods=1)
    heat_map = sb.heatmap(corr_mat, linewidths = .5)

    plt.show()

It seems to me like the second dataFrame should work just as well, but the corr() matrix ends up returning NaN values. 在我看来,第二个dataFrame应该也能正常工作,但是corr()矩阵最终返回NaN值。

第二个数据帧没有行,两个值都不都不为空,因此没有数据点可在其上找到相关性

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM