I have a DataFrame that looks something like this:
A B C D
1 String1 String2 String3 String4
2 String2 String3 String4 String5
3 String3 String4 String5 String6
.........................................
My goal is to turn this DataFrame to a libSVM format.
What I have tried so far is the following:
dummy= pd.get_dummies(dataframe)
dummy.to_csv('dataframe.csv', header=False, index=False)
is there a way to turn the dataframe or the csv file to this format. Or is there a smarter way to do the transformation?
I tried loading the script that's meant to do this from this repository as follows:
%load libsvm2csv.py
and the script is loaded correctly, but when I run:
libsvm2csv.py dataframe.csv dataframe.data 0 True
or
libsvm2csv.py dataframe.csv dataframe.txt 0 True
I get "SyntaxError: invalid syntax"
pointing at dataframe.csv
After preprocessing your data, you can extract a matrix and use scikit-learns dump_svmlight_file to create this format.
import pandas as pd
from sklearn.datasets import dump_svmlight_file
dummy = pd.get_dummies(dataframe)
mat = dummy.as_matrix()
dump_svmlight_file(mat, y, 'svm-output.libsvm') # where is your y?
You are mentioning libsvm2csv.py to do this conversion, but it's just the wrong direction. It is libsvm-format -> csv .
Check phraugs csv2libsvm.py if you want to convert from cvs -> libsvm (without scikit-learn).
I prefer the usage of scikit-learn (compared to phraug)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.