简体   繁体   中英

What am I doing wrong when training a model?

I solve the following problem: `

We have collected more data on cats and dogs, and are ready to train our robot to classify them! Download a training dataset https://stepik.org/media/attachments/course/4852/dogs_n_cats.csv and train the Decision Tree on it. After that, download the dataset from the assignment and predict which observations belong to whom. Enter the number of dogs in your dataset. A certain error is allowed in the assignment.

I trained the model:

import sklearn
import pandas as pd
import numpy as nm
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import tree
from sklearn.model_selection import train_test_split, cross_val_score

df = pd.read_csv('dogs_n_cats.csv')

X = df.drop(['Вид', 'Шерстист'], axis=1)
y = df['Вид']

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.67, random_state=42)

clf = tree.DecisionTreeClassifier(criterion='entropy', max_depth=4)
clf.fit(X_train, y_train)

After that, I downloaded the dataset from the task https://stepik.org/api/attempts/540562013/file and began to determine the number of dogs in the dataset:

df2 = pd.read_json('we.txt')

X2 = df.drop(['Вид', 'Шерстист'], axis=1)
y2 = df['Вид']
X_train2, X_test2, y_train2, y_test2 = train_test_split(X, y, train_size=0.67, random_state=42)

df2_predict = clf.predict(X2)
l = list(df2_predict)
l.count('собачка')

The number of dogs in the task should be 49, but after executing l.count ('dog') I get 500. What am I doing wrong when training a model?

This seems to be a typo. In your snippet, you're using the first dataframe to create X2 .

I cannot access the second file, but changing this line should do the trick:

X2 = df.drop(['Вид', 'Шерстист'], axis=1)
-->
X2 = df2.drop(['Вид', 'Шерстист'], axis=1)

Besides that, you're already provided with a training set and test set, so none of the calls to train_test_split should be necessary.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM