简体   繁体   中英

turn categorical data to numeric and save to libsvm format python

I have a DataFrame that looks something like this:

    A         B        C        D
1   String1   String2  String3  String4
2   String2   String3  String4  String5
3   String3   String4  String5  String6
.........................................

My goal is to turn this DataFrame to a libSVM format.

What I have tried so far is the following:

dummy= pd.get_dummies(dataframe)
dummy.to_csv('dataframe.csv', header=False, index=False)

is there a way to turn the dataframe or the csv file to this format. Or is there a smarter way to do the transformation?

I tried loading the script that's meant to do this from this repository as follows:

%load libsvm2csv.py

and the script is loaded correctly, but when I run:

libsvm2csv.py dataframe.csv dataframe.data 0 True

or

libsvm2csv.py dataframe.csv dataframe.txt 0 True

I get "SyntaxError: invalid syntax" pointing at dataframe.csv

After preprocessing your data, you can extract a matrix and use scikit-learns dump_svmlight_file to create this format.

Example code:

import pandas as pd
from sklearn.datasets import dump_svmlight_file

dummy = pd.get_dummies(dataframe)
mat = dummy.as_matrix()
dump_svmlight_file(mat, y, 'svm-output.libsvm')  # where is your y?

Remarks / Alternative:

You are mentioning libsvm2csv.py to do this conversion, but it's just the wrong direction. It is libsvm-format -> csv .

Check phraugs csv2libsvm.py if you want to convert from cvs -> libsvm (without scikit-learn).

I prefer the usage of scikit-learn (compared to phraug)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM