简体   繁体   English

如何在python中测试两组之间的相关性?

[英]How to test correlation between two sets in python?

I have two different dataframe and one of them is as below我有两个不同的数据框,其中之一如下

df1= df1=

      Datetime      BSL
0          7  127.504505
1          8  115.254132
2          9  108.994275
3         10  102.936860
4         11   99.830400
5         12  114.660522
6         13  138.215339
7         14  132.131075
8         15  121.478006
9         16  113.795645
10        17  114.038462

the other one is df2=另一个是 df2=

    Datetime       Number of Accident
0          7                  3455
1          8                 17388
2          9                 27767
3         10                 33622
4         11                 33474
5         12                 12670
6         13                 28137
7         14                 27141
8         15                 26515
9         16                 24849
10        17                 13013

the first one Blood Sugar Level of people based on time (7 means between 7 am and 8 am) the second one is number of accident between these times第一个是基于时间的人的血糖水平(7 表示早上 7 点到早上 8 点之间)第二个是这些时间之间的事故次数

when I try to this code当我尝试使用此代码时

df1.corr(df2, "pearson")

I got as error:我得到了错误:

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

How can I solve it?我该如何解决? Or, how can I test correlation between two different variables?或者,我如何测试两个不同变量之间的相关性?

from scipy.stats import pearsonr
df_full = df1.merge(df2,how='left')
full_correlation = pearsonr(df_full['BSL'],df_full['Accidents'])
print('Correlation coefficient:',full_correlation[0])
print('P-value:',full_correlation[1])

Output:输出:

(-0.2934597230564072, 0.3811116115819819)
Correlation coefficient: -0.2934597230564072
P-value: 0.3811116115819819

Edit:编辑:

You want an hourly correlation, but it is impossible mathematically because you have only 1 xy value for each hour.您想要每小时的相关性,但在数学上是不可能的,因为每小时只有 1 个 xy 值。 Therefore the output will be full of NaNs.因此,输出将充满 NaN。 This is the code, however the output is invalid:这是代码,但输出无效:

df_corr = df_full.groupby('Datetime')['BSL','Accidents'].corr().drop(columns='BSL').drop('Accidents',level=1).rename(columns={'Accidents':'Correlation'})
print(df_corr)

Output:输出:

              Correlation
Datetime                 
7        BSL          NaN
8        BSL          NaN
9        BSL          NaN
10       BSL          NaN
11       BSL          NaN
12       BSL          NaN
13       BSL          NaN
14       BSL          NaN
15       BSL          NaN
16       BSL          NaN
17       BSL          NaN

由于您的数据框有多个列,您需要指定要使用的列的名称:

df1['BSL'].corr(df2['Number of Accident'], "pearson")

The corr() method of a pandas dataframe calculates a correlation matrix for all columns in one dataframe. pandas 数据帧的corr()方法计算一个数据帧中所有列的相关矩阵。 You have two dataframes, so that method won't work.您有两个数据框,因此该方法不起作用。 You can solve this by doing:您可以通过执行以下操作来解决此问题:

df1['number'] = df2['Number of Accident']
df1.corr("pearson")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python:两组数据之间的相关系数 - Python: correlation co-efficient between two sets of data 如何检查两个数据集的匹配列之间的相关性? - How to check correlation between matching columns of two data sets? 如果我使用Johansen测试来确定python中两个时间序列之间的相关性,如何读取测试结果? - How to read test results if I am using Johansen Test to determine correlation between two time series in python? 如何计算 Python 中两个函数之间的相关性 - How to calculate correlation between two functions in Python Python - 如何找到两个向量之间的相关性? - Python - How to find a correlation between two vectors? 两个数据帧中单列之间的python相关性测试 - python correlation test between single columns in two dataframes 如何计算两列中两组数据之间的相关性? - How can I calculate correlation between two sets of data within two columns? 两个单元格Python之间的相关性 - Correlation between two cells Python Python:如何找到两个值之间的相关性并去除噪声? - Python: how to find correlation between two values and remove noise? 如何在python中找到两个分类变量之间的相关性? - How do I find correlation between two categorical variable in python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM