[英]How to select features, using python, when dataframe has features in rows
My data frame is like this: Where px1, px2,...px99 are placeholders and appear in data frame as columns. 我的数据框是这样的:其中px1,px2,... px99是占位符,并在数据框中显示为列。 It has values like 5569, 5282 etc, which are real features to be selected.
它具有像5569、5282等的值,这是要选择的真实特征。 These features are in many thousands.
这些功能成千上万。 I want to filter important features.
我想过滤重要功能。 Trying to use Random Forest.
尝试使用随机森林。 I know I can filter Px's from Random Forest but how actual features embedded within?
我知道我可以过滤随机森林中的Px,但是实际嵌入的功能如何? I am using python.
我正在使用python。
px1 px2 px3 px4 px5 px6 px7 px8 px9 px10 px1 px2 px3 px4 px5 px6 px7 px8 px9 px10
5569 5282 93 5569 5282 93
5569 5280 93 9904 5569 5280 93 9904
5569 5282 93 93 3893 8872 3897 9904 5569 5282 93 93 3893 8872 3897 9904
5569 5280 5551 93 93 3995 8607 5569 5280 5551 93 93 3995 8607
5569 5280 93 8867 5569 5280 93 8867
5282 5569 93 9904 93 5282 5569 93 9904 93
You don't need more than 2 column cause chronology doesn't matter,so 您不需要多于2列,因为时间顺序无关紧要,所以
df = pds.concat([df[['px1',col]].rename(columns={col:'px2'}) for col in df.columns],\
axis=0,join='outer').dropna()
Now, because you only consider the 1st variable, you have to see: 现在,由于仅考虑第一个变量,因此必须看到:
for label,dist in df.groupby('px1')['px2']:
dist.hist(bins=len(dist.unique()),label=label)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.