简体   繁体   English

在 Python 中如何做多于 2 个变量的多列之间的相关性?

[英]In Python how to do Correlation between Multiple Columns more than 2 variables?

I have a Pandas Dataframe like so:我有一个像这样的熊猫数据框:

id    cat1    cat2    cat3    num1    num2
1     0       WN      29      2003    98
2     1       TX      12      755     76
3     0       WY      11      845     32
4     1       IL      19      935     46

I want to find out the correlation between cat1 and column cat3 , num1 and num2 or between cat1 and num1 and num2 or between cat2 and cat1, cat3, num1, num2我想找出cat1cat3列、 num1num2之间或cat1num1num2之间或cat2cat1, cat3, num1, num2之间的相关性

When I use df.corr() it gives Correlation between all the columns in the dataframe, but I want to see Correlation between just these selective columns detailed above.当我使用df.corr()时,它会给出数据框中所有列之间的相关性,但我想查看上面详述的这些选择性列之间的相关性。

How do I do that in Python pandas?我如何在 Python pandas 中做到这一点?

A Thousand thanks in advance for your answers.提前一千感谢您的回答。

I tried the following and it worked:我尝试了以下并且有效:

features1=list(['cat1','cat2','cat3'])
features2=list(['Cat1', 'Cat2','num1','num2'])

df[features1].corr()
df[features2].corr()

Good way to select the columns based on the need when you have a very high number of variables in your dataset.当数据集中有大量变量时,根据需要选择列的好方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM