繁体   English   中英

使用 Pytorch 进行低准确度二元分类

[英]Low accuracy binary classification with Pytorch

使用 Pytorch 在Breast-Cancer-Wisconsin-Diagnostic-DataSet上练习二元分类的深度学习。

我尝试了不同的方法,我能得到的最好的方法如下,准确率仍然很低,只有 61%。

有什么办法可以提高准确率?

谢谢你。

import pandas as pd
import io

dataset = pd.read_excel(base_dir + "Breast-Cancer-Wisconsin-Diagnostic.xlsx")

number_of_columns = dataset.shape[1]

# training and testing split of 70:30
dataset['diagnosis'] = pd.Categorical(dataset['diagnosis']).codes
dataset = dataset.sample(frac=1, random_state=1234)
train_input = dataset.values[:398, :number_of_columns-1]
train_target = dataset.values[:398, number_of_columns-1]
test_input = dataset.values[398:, :number_of_columns-1]
test_target = dataset.values[398:, number_of_columns-1]

import torch
torch.manual_seed(1234)
hidden_units = 5
net = torch.nn.Sequential(
torch.nn.Linear(number_of_columns-1, hidden_units),
torch.nn.ReLU(),
torch.nn.Linear(hidden_units, 2))

# choose optimizer and loss function
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.parameters(), lr=0.1,momentum=0.9)

# train
epochs = 50
for epoch in range(epochs):
    inputs = torch.autograd.Variable(torch.Tensor(train_input).float())
    targets = torch.autograd.Variable(torch.Tensor(train_target).long())
    optimizer.zero_grad()
    out = net(inputs)
    loss = criterion(out, targets)
    loss.backward()
    optimizer.step()
    if epoch == 0 or (epoch + 1) % 10 == 0:
        print('Epoch %d Loss: %.4f' % (epoch + 1, loss.item()))

# Epoch 1 Loss: 412063.1250
# Epoch 10 Loss: 0.6628
# Epoch 20 Loss: 0.6639
# Epoch 30 Loss: 0.6592
# Epoch 40 Loss: 0.6587
# Epoch 50 Loss: 0.6588


import numpy as np
inputs = torch.autograd.Variable(torch.Tensor(test_input).float())
targets = torch.autograd.Variable(torch.Tensor(test_target).long())
optimizer.zero_grad()
out = net(inputs)
_, predicted = torch.max(out.data, 1)
error_count = test_target.size - np.count_nonzero((targets == predicted).numpy())

print('Errors: %d; Accuracy: %d%%' % (error_count, 100 * torch.sum(targets == predicted) // test_target.size))

# Errors: 65; Accuracy: 61%

代表样本的特征在不同的范围内。 所以,你应该做的第一件事是规范化数据。

您应该在训练和验证/测试数据集的训练时期绘制损失和acc,以了解模型是在训练数据上过拟合还是欠拟合。

此外,您可以尝试使用更复杂(更深)的模型。 而且由于您的训练数据集的样本数量很少,因此您也可以尽可能考虑增强和迁移学习。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM