简体   繁体   English

在python中计算皮尔逊相关性

[英]Calculate pearson correlation in python

I have 4 columns "Country, year, GDP, CO2 emissions"我有 4 列“国家、年份、GDP、二氧化碳排放量”

I want to measure the pearson correlation between GDP and CO2emissions for each country.我想衡量每个国家的 GDP 和二氧化碳排放量之间的皮尔逊相关性。

The country column has all the countries in the world and the year has the values "1990, 1991, ...., 2018". country 列包含世界上所有的国家,年份的值为“1990, 1991, ...., 2018”。

在此处输入图片说明

You should use a groupby grouped with corr() as your aggregation function:您应该使用与corr()分组的groupby作为聚合函数:

country = ['India','India','India','India','India','China','China','China','China','China']
Year = [2018,2017,2016,2015,2014,2018,2017,2016,2015,2014]
GDP = [100,98,94,64,66,200,189,165,134,130]
CO2 = [94,96,90,76,64,180,172,150,121,117]
df = pd.DataFrame({'country':country,'Year':Year,'GDP':GDP,'CO2':CO2})
print(df.groupby('country')[['GDP','CO2']].corr()

If we work this output a bit we can go to something fancier:如果我们稍微处理一下这个输出,我们可以做一些更有趣的事情:

df_corr = (df.groupby('country')['GDP','CO2'].corr()).drop(columns='GDP').drop('CO2',level=1).rename(columns={'CO2':'Correlation'})
df_corr = df_corr.reset_index().drop(columns='level_1').set_index('country',drop=True)
print(df_corr)

Output:输出:

         Correlation
country             
China       0.999581
India       0.932202

My guess is that you want to have the pearson coef for each country.我的猜测是您想要每个国家/地区的 pearson coef。 Using pearsonr you can loop through and create a dictionary for each country.使用pearsonr您可以遍历并为每个国家/地区创建字典。

from scipy.stats.stats import pearsonr
df = pd.DataFrame({"column1":["value 1", "value 1","value 1","value 1","value 2", "value 2", "value 2", "value 2"], 
              "column2":[1,2,3,4,5, 1,2,3],
             "column3":[10,30,50, 60, 80, 10, 90, 20],
             "column4":[1, 3, 5, 6, 8, 5, 2, 3]})


results = {}
for country in df.column1.unique():
    results[country] = {}
    pearsonr_value = pearsonr(df.loc[df["column1"]== country, "column3"],df.loc[df["column1"] == country, "column4"])
    results[country]["pearson"] = pearsonr_value[0]
    results[country]["pvalue"] = pearsonr_value[0]

print(results["value 1"])
#{'pearson': 1.0, 'pvalue': 1.0}

print(results["value 2"])
#{'pearson': 0.09258200997725514, 'pvalue': 0.09258200997725514}

在此处输入图片说明

Thank you @Celius it worked and gave me the results i wanted.谢谢@Celius,它有效并给了我想要的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM