How do I format my data correctly for running an SVM algorithm?

Question

In both Jupyter Notebook and PyCharm I can run the below SVM algorithm, using iris data, with no issues. However, when I swap out the iris data with my own, in Jupyter Notebook I get:

The kernel appears to have died. It will restart automatically.

And in PyCharm I get:

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

This works fine:

from sklearn.datasets import load_iris
iris=load_iris()
# iris
from sklearn import datasets
from sklearn import metrics
from sklearn.svm import SVC
# load the iris datasets
dataset = datasets.load_iris()
# fit a SVM model to the data
model = SVC()
model.fit(dataset.data, dataset.target)
print(model)

The problem starts when I try replacing the iris data with my own dataset, which I thought had been properly formatted. You can get the dataset here (individual-level) :

#data = pd.io.stata.read_stata('individual.dta')
#data.to_csv('individual.csv')
ind = pd.read_csv('individual.csv')
ind = ind.dropna()
train, test = train_test_split(ind, test_size=0.2)
df = pd.DataFrame()
df['patience'] = train['patience']
df['risktaking'] = train['risktaking']
df['posrecip'] = train['posrecip']
df['negrecip'] = train['negrecip']
df['altruism'] = train['altruism']
df['trust'] = train['trust']
index = train['Unnamed: 0']
dataset = {}
data = np.ascontiguousarray(df.values) 
#print(dataset['data'].flags) #shows C-Contiguous as TRUE
dataset['data'] = data
index = index.values
dataset['target'] = index

from sklearn.svm import SVC
X = dataset['data']
y = dataset['target']

clf = SVC(gamma='auto')
clf.fit(X, y) 

print(clf.predict([[0.04717605, 1.0202034, 1.0202034, -0.3671751, -0.1399527, 1.6797541]]))

When I compare the iris dataset to the one I have manufactured, they look to be the same in terms of type - clearly the dimensions are different, but that shouldn't be affecting the SVM model.

I'm thinking there must be something wrong with the way I have constructed the arrays, thus altering their interpretation by SVC(). I had trouble earlier with setting the dict() attributes of 'data' and 'target'.

As you can see, the iris dataset calls "dataset.data, dataset.target", whereas I have to resort to "dataset['data'], dataset['target']".

I put both versions of the SVC() so you can see that they both work fine for the iris data, but neither likes my data very much.

Help is appreciated. Code critics: you'll only be helpful if you actually solve the issue. Comments from the peanut gallery are not useful.

Answer 1

Here is a solution I found via this previous answer: , which I was able to apply successfully.

Now, the answer doesn't exactly resolve the issues I was facing in this example - mainly that the kernel stopped responding. I still do not know why I was unsuccessful with my attempt.

However, the link and answer provided does allow me to organize my dataset in an appropriate manner and apply it to the SVC model.

How do I format my data correctly for running an SVM algorithm?

Question

1 answers

solution1
0 2019-05-05 12:02:15

How do I format my data correctly for running an SVM algorithm?

Question

1 answers

solution1 0 2019-05-05 12:02:15

solution1
0 2019-05-05 12:02:15