[英]Calculate pearson correlation in python
I have 4 columns "Country, year, GDP, CO2 emissions"我有 4 列“国家、年份、GDP、二氧化碳排放量”
I want to measure the pearson correlation between GDP and CO2emissions for each country.我想衡量每个国家的 GDP 和二氧化碳排放量之间的皮尔逊相关性。
The country column has all the countries in the world and the year has the values "1990, 1991, ...., 2018". country 列包含世界上所有的国家,年份的值为“1990, 1991, ...., 2018”。
You should use a groupby
grouped with corr()
as your aggregation function:您应该使用与
corr()
分组的groupby
作为聚合函数:
country = ['India','India','India','India','India','China','China','China','China','China']
Year = [2018,2017,2016,2015,2014,2018,2017,2016,2015,2014]
GDP = [100,98,94,64,66,200,189,165,134,130]
CO2 = [94,96,90,76,64,180,172,150,121,117]
df = pd.DataFrame({'country':country,'Year':Year,'GDP':GDP,'CO2':CO2})
print(df.groupby('country')[['GDP','CO2']].corr()
If we work this output a bit we can go to something fancier:如果我们稍微处理一下这个输出,我们可以做一些更有趣的事情:
df_corr = (df.groupby('country')['GDP','CO2'].corr()).drop(columns='GDP').drop('CO2',level=1).rename(columns={'CO2':'Correlation'})
df_corr = df_corr.reset_index().drop(columns='level_1').set_index('country',drop=True)
print(df_corr)
Output:输出:
Correlation
country
China 0.999581
India 0.932202
My guess is that you want to have the pearson coef for each country.我的猜测是您想要每个国家/地区的 pearson coef。 Using
pearsonr
you can loop through and create a dictionary for each country.使用
pearsonr
您可以遍历并为每个国家/地区创建字典。
from scipy.stats.stats import pearsonr
df = pd.DataFrame({"column1":["value 1", "value 1","value 1","value 1","value 2", "value 2", "value 2", "value 2"],
"column2":[1,2,3,4,5, 1,2,3],
"column3":[10,30,50, 60, 80, 10, 90, 20],
"column4":[1, 3, 5, 6, 8, 5, 2, 3]})
results = {}
for country in df.column1.unique():
results[country] = {}
pearsonr_value = pearsonr(df.loc[df["column1"]== country, "column3"],df.loc[df["column1"] == country, "column4"])
results[country]["pearson"] = pearsonr_value[0]
results[country]["pvalue"] = pearsonr_value[0]
print(results["value 1"])
#{'pearson': 1.0, 'pvalue': 1.0}
print(results["value 2"])
#{'pearson': 0.09258200997725514, 'pvalue': 0.09258200997725514}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.