简体   繁体   中英

How to select features, using python, when dataframe has features in rows

My data frame is like this: Where px1, px2,...px99 are placeholders and appear in data frame as columns. It has values like 5569, 5282 etc, which are real features to be selected. These features are in many thousands. I want to filter important features. Trying to use Random Forest. I know I can filter Px's from Random Forest but how actual features embedded within? I am using python.

px1 px2 px3 px4 px5 px6 px7 px8 px9 px10

5569 5282 93
5569 5280 93 9904
5569 5282 93 93 3893 8872 3897 9904
5569 5280 5551 93 93 3995 8607
5569 5280 93 8867
5282 5569 93 9904 93

You don't need more than 2 column cause chronology doesn't matter,so

df = pds.concat([df[['px1',col]].rename(columns={col:'px2'}) for col in df.columns],\
                 axis=0,join='outer').dropna()

Now, because you only consider the 1st variable, you have to see:

for label,dist in df.groupby('px1')['px2']:
   dist.hist(bins=len(dist.unique()),label=label)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM