turn categorical data to numeric and save to libsvm format python

Question

I have a DataFrame that looks something like this:

    A         B        C        D
1   String1   String2  String3  String4
2   String2   String3  String4  String5
3   String3   String4  String5  String6
.........................................

My goal is to turn this DataFrame to a libSVM format.

What I have tried so far is the following:

dummy= pd.get_dummies(dataframe)
dummy.to_csv('dataframe.csv', header=False, index=False)

is there a way to turn the dataframe or the csv file to this format. Or is there a smarter way to do the transformation?

I tried loading the script that's meant to do this from this repository as follows:

%load libsvm2csv.py

and the script is loaded correctly, but when I run:

libsvm2csv.py dataframe.csv dataframe.data 0 True

or

libsvm2csv.py dataframe.csv dataframe.txt 0 True

I get "SyntaxError: invalid syntax" pointing at dataframe.csv

Answer 1

After preprocessing your data, you can extract a matrix and use scikit-learns dump_svmlight_file to create this format.

Example code:

import pandas as pd
from sklearn.datasets import dump_svmlight_file

dummy = pd.get_dummies(dataframe)
mat = dummy.as_matrix()
dump_svmlight_file(mat, y, 'svm-output.libsvm')  # where is your y?

Remarks / Alternative:

You are mentioning libsvm2csv.py to do this conversion, but it's just the wrong direction. It is libsvm-format -> csv .

Check phraugs csv2libsvm.py if you want to convert from cvs -> libsvm (without scikit-learn).

I prefer the usage of scikit-learn (compared to phraug)

turn categorical data to numeric and save to libsvm format python

Question

1 answers

solution1
1 ACCPTED 2016-10-04 23:23:21

Example code:

Remarks / Alternative:

turn categorical data to numeric and save to libsvm format python

Question

1 answers

solution1 1 ACCPTED 2016-10-04 23:23:21

Example code:

Remarks / Alternative:

solution1
1 ACCPTED 2016-10-04 23:23:21