简体   繁体   English

如何从数据集中查找重要列?

[英]How to find Significant columns from dataset?

I want to know how can we find out which is the significant column among the dataset.我想知道我们如何找出数据集中的重要列。 for eg.例如。 sepal length, sepal width, petal length, petal width, and species are the columns in dataset which is the significant column among the five of them.萼片长度、萼片宽度、花瓣长度、花瓣宽度和物种是数据集中的列,是这五个列中的重要列。

import pandas as pd
import seaborn as sns
from sklearn import datasets

iris = datasets.load_iris()

# merge data and target into dataframe
data = pd.DataFrame(iris.data, columns=iris.feature_names)
data['Target'] = iris.target

corelation_values = data.corr()

corr_heatmap = sns.heatmap(corelation_values, xticklabels=data.columns, yticklabels=data.columns)

The correlation heatmap output is as the following:相关热图 output 如下: 在此处输入图像描述

it is evident that all the other features in iris dataset are highly correlated with each other, so the most significant feature (with the most distinctive nature) is sepal width .很明显,iris 数据集中的所有其他特征都相互高度相关,因此最显着的特征(具有最独特的性质)是萼片宽度

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM