简体   繁体   中英

How to use feature selection and dimensionality reduction in Unsupervised learning?

I've been working on classifying emails from two authors. I've been successful in executing the same using supervised learning along with TFIDF vectorization of text, PCA and SelectPercentile feature selection. I used scikit-learn package to achieve the same.

Now I wanted to try the same using Unsupervised Learning KMeans algorithm to cluster the emails into two groups. I have created dataset wherein I have each data point as a single line in the python list. Since I am a newbie to unsupervised so I wanted to ask if I can apply the same dimensionality reduction tools as used in supervised (TFIDF, PCA and SelectPercentile). If not then what are their counterparts? I am using scikit-learn for coding it up.

I looked around on stackoverflow but couldn't get a satisfactory answer. I am really stuck at this point.

Please help!

Following are the techniques for dimensionality reduction that can be applied in case of Unsupervised Learning:-

  1. PCA: principal component analysis
    • Exact PCA
    • Incremental PCA
    • Approximate PCA
    • Kernel PCA
    • SparsePCA and MiniBatchSparsePCA
  2. Random projections
    • Gaussian random projection
    • Sparse random projection
  3. Feature agglomeration
    • Standard Scaler

Mentioned above are some of the approaches that can be used for dimensionality reduction of huge data in case on unsupervised learning. You can read more about the details here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM