简体   繁体   中英

How to process feature vectors with different dimension in machine learning?

I'm a beginner in machine learning, and I'm trying to use a data set to train a log linear classifier. The data set contains five features, and each feature is a vector, but the dimension of the features are different. The dimensions are 3, 1, 6, 2, and 2 respectively. I tried PCA method to reduce the dimensions to 1 with scikit-learn, but it didn't works well. So how do I process the features to fit a log linear classifier model like logistic regression?

A simple way to do this is just to flatten all of your features. And then feed it into your classifier.

An example:

features = [... 
          [[0, 1 3], [5], [2, 6, 4, 7, 8, 9], [1, 0], [0, 1]], #for one sample
          ...]

Use a list comprehension to flatten each list inside features:

flattened_features = [[i for k in f for i in k] for f in features]

This will turn features into something like this:

    flattened_features
    [... 
    [0,1,3,5,2,6,4,7,8,9,1,0,0,1], #for one sample
    ...]

Now you can convert this into a numpy array and feed it into your model.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM