简体   繁体   English

Python:如何编写 function 以确定 dataframe 中的哪个变量与指定列的绝对相关性最高?

[英]Python: How do I write a function to determine which variable in a dataframe has the highest absolute correlation with a specified column?

I would like to write a function to determine which variable in a dataframe has the highest absolute correlation with a specific column.我想写一个 function 来确定 dataframe 中的哪个变量与特定列的绝对相关性最高。 However, I am having difficulty to get the column name from the correlation matrix.但是,我很难从相关矩阵中获取列名。

Say that my data, df , is as following:假设我的数据df如下:

address地址 size尺寸 rent_price租金价格 number_of_bathrooms number_of_浴室 number_of_rooms房间的数量
East东方 12 12 3400 3400 2 2 4 4
North East东北 99 99 4200 4200 4 4 4 4
South 99 99 4000 4000 5 5 5 5

I use ab_col_matrix = abs(df.corr()) to generate the correlation matrix something like, with column names at the top and the left-hand side of the matrix.我使用ab_col_matrix = abs(df.corr())来生成类似的相关矩阵,列名位于矩阵的顶部和左侧。

1 value value value 
value 1 value value 
value value 1 value 
value value value 1 

Say that I am interested in the highest correlated column to the size column.假设我对与大小列相关的最高列感兴趣。 My idea is that I would sort the column and take the first row and return the column name with the highest value.我的想法是对列进行排序并取第一行并返回具有最高值的列名。

so I tried, sorted = ab_col_matrix.sort_values('size', ascending = False) \所以我尝试了, sorted = ab_col_matrix.sort_values('size', ascending = False) \

then I tried to pick highest one, the sorted['size'][1] but it is only returning the value itself but not the column and I am puzzled how I could access that.然后我尝试选择最高的sorted['size'][1]但它只返回值本身而不是列,我很困惑如何访问它。 Here I used [1] because [0] would return 1 which is the correlation value for its own column.这里我使用[1]因为[0]将返回 1,这是它自己列的相关值。

I would very much appreciate any help where I could gain more knowledge as to how to achieve this.我将非常感谢任何帮助,我可以获得更多关于如何实现这一目标的知识。

You can simply select the column for the variable you want and then sort the rows:您可以简单地 select 您想要的变量的列,然后对行进行排序:

ab_col_matrix['size'].sort_values(ascending=False)

size                   1.000000
rent_price             0.970725
number_of_bathrooms    0.944911
number_of_rooms        0.500000
Name: size, dtype: float64

You can then select the highest correlated value with the following:然后,您可以 select 与以下最高相关值:

ab_col_matrix['size'].sort_values(ascending=False).index[1]

'rent_price'

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何向 pandas dataframe 添加一列,该列在一个范围内具有最高值但将其应用于每一行? - How do I add a column to a pandas dataframe which has the highest value in a range but applying it to every row? 如何确定 dataframe 中哪一行分布最均匀和最高 - How to determine which row in dataframe has most even and highest distribution 如何确定Python中的相关系数? - How do I determine a correlation coefficient in Python? 识别 dataframe 中的哪一列在 python 中具有最高值 - Identifying which column in a dataframe has the highest value in python 如何在同一数据框中生成数据框两列的相关系数作为新的列变量? - How do I generate the Correlation Coefficient of two columns of a dataframe as a new column variable in the same dataframe? 如何从数组中输入最高值的最高值打印出来? 在Python 3中 - How do I print the highest of an inputted value from an array, which has the same highest value? In Python 3 如何确定要在python中定义的函数使用哪个变量 - How can I determine which variable to use for a defined function in python Python Twisted如何确定绝对URL? - Python Twisted How do I determine an absolute URL? 如何在 python 中加入数据帧,其中每个 dataframe 都有一个代表特定时间不同进程值的列 - How do I join dataframes in python where each dataframe has a column which represents different processes values at a specific time 在熊猫中,如何删除所有子行,但在multiIndex数据帧的特定列中保留值最高的子行? - In Pandas how to remove all subrows but keep one which has the highest value in a specific column in a multiIndex dataframe?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM