简体   繁体   English

当数据框的行中有要素时,如何使用python选择要素

[英]How to select features, using python, when dataframe has features in rows

My data frame is like this: Where px1, px2,...px99 are placeholders and appear in data frame as columns. 我的数据框是这样的:其中px1,px2,... px99是占位符,并在数据框中显示为列。 It has values like 5569, 5282 etc, which are real features to be selected. 它具有像5569、5282等的值,这是要选择的真实特征。 These features are in many thousands. 这些功能成千上万。 I want to filter important features. 我想过滤重要功能。 Trying to use Random Forest. 尝试使用随机森林。 I know I can filter Px's from Random Forest but how actual features embedded within? 我知道我可以过滤随机森林中的Px,但是实际嵌入的功能如何? I am using python. 我正在使用python。

px1 px2 px3 px4 px5 px6 px7 px8 px9 px10 px1 px2 px3 px4 px5 px6 px7 px8 px9 px10

5569 5282 93 5569 5282 93
5569 5280 93 9904 5569 5280 93 9904
5569 5282 93 93 3893 8872 3897 9904 5569 5282 93 93 3893 8872 3897 9904
5569 5280 5551 93 93 3995 8607 5569 5280 5551 93 93 3995 8607
5569 5280 93 8867 5569 5280 93 8867
5282 5569 93 9904 93 5282 5569 93 9904 93

You don't need more than 2 column cause chronology doesn't matter,so 您不需要多于2列,因为时间顺序无关紧要,所以

df = pds.concat([df[['px1',col]].rename(columns={col:'px2'}) for col in df.columns],\
                 axis=0,join='outer').dropna()

Now, because you only consider the 1st variable, you have to see: 现在,由于仅考虑第一个变量,因此必须看到:

for label,dist in df.groupby('px1')['px2']:
   dist.hist(bins=len(dist.unique()),label=label)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从 Python 中的数据帧加载特征和标签? - How to load features and label from dataframe in Python? 使用Scikit学习管道,当要素依赖于其他行时,如何从时间序列数据中生成要素? - Using Scikit-learn Pipelines, how can features be generated from time series data when the features depend on other rows? 如何在Python中结合文本特征和分类特征? - How to combine text features and categorical features in Python? 如何检查是否存在具有所有特征 nan 值的行/样本? - How to Check if there are rows/samples that has all features nan value? 如何使用 Python 控制 Windows 的程序和功能 - How to control Windows' Programs and Features using Python 如何使用 python pandas 打印相关特征? - How to print correlated features using python pandas? 使用 Python 对 dataframe 中的特征列表进行分类编码的 For 循环 - For loop for categorical encoding on list of features in dataframe using Python 如何使用 python 数据表库从值矩阵(列表列表)和特征列表创建数据表 dataframe - How to create datatable dataframe from a matrix of values (list of lists) and a list of features, using python datatable lib 如何使用Python根据另一个DataFrame中的行选择DataFrame中的行 - How to select rows in a DataFrame based on rows in another DataFrame using Python 如何在python中的数据框中快速生成二次数值特征? - How to quickly generate quadratic numeric features in a dataframe in python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM