简体   繁体   中英

Pre Processing spiral dataset to use for Logistic Regression

So I need to classify a spiral dataset. I have been experimenting with a bunch of algorithms like KNN, Kernel SVM, etc. I would like to try to improve the performance of Logistic Regression using feature engineering, preprocessing, etc.

I am also using scikit learn to do all of the classifications.

I fully understand Logistic Regression is not the proper algorithm to do this sort of problem. This is more of a learning excerise for Pre processing and other feature engineering/extraction methods to see how much I can improve this specific model.

Here is an example dataset I would use for the classification. Any suggestions of how I can manipulate the dataset to use in the Logistic Regression algorithm would be helpful.

示例数据集

I also have datasets with multiple spirals as well. some datasets have 2 classes or sometimes up to 5. This means up to 5 spirals.

Since the data doesn't seem to be linearly separable, you can try using the Kernel Trick method commonly used in Support Vector Classification. The kernel function accepts inputs in the original lower-dimensional space and returns the dot product of the transformed vectors in the higher dimensional space. That means transformed vector ϕ(x) is just some function of the coordinates in the corresponding lower-dimensional vector x.

Logistic Regression is generally used as a linear classifier ie the decision boundary separating one class samples from the other is a linear(straight-line) but it can be used for non-linear decision boundaries as well.

Using the kernel trick in SVC is also good option as it maps the data in the lower dimension to higher dimension making it linearly separable.

example:

例子

In the above example, the data is not linearly separable in lower dimension, but after applying the transformation ϕ(x) = x² and adding the second dimension to the features we have the right side graph that becomes linearly separable.

You can start transforming the data by creating new features for applying logistic regression. Also try SVC(Support Vector Classifier) that uses kernel trick. For SVC you don't have to transform the data into higher dimensions explicitly.

There are few resources which are great for learning are one and two

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM