简体   繁体   中英

'1D target tensor expected' error when using pytorch tensorDataset class

I am wondering why this error is occuring. My hunch tells me that the tensorDataset reads the last column as being the labels, but I don't know why it would behave that way if I input a separate dataset for labels as the second argument. Also, can someone explain exactly how one-hot encoding works and how I can fix this problem because I only want one label per item?

Error: return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: 1D target tensor expected, multi-target not supported

Code:

if __name__ == '__main__':

inputs_file = pd.read_csv('dataset.csv')
targets_file = pd.read_csv('labels.csv')

inputs = inputs_file.iloc[1:1001].values
targets = targets_file.iloc[1:1001].values

inputs = torch.tensor(inputs, dtype=torch.float32)
targets = torch.tensor(targets)

dataset = TensorDataset(inputs, targets)

val_size = 200
test_size = 100
train_size = len(dataset) - (val_size + test_size)

# Divide dataset into 3 unique random subsets
training_data, validation_data, test_data = random_split(dataset, [train_size, val_size, test_size])

batch_size = 50

train_loader = DataLoader(training_data, batch_size, shuffle=True, num_workers=4, pin_memory=True)
valid_loader = DataLoader(validation_data, batch_size*2, num_workers=4, pin_memory=True)

From what I gather from the comments discussion, the error is reproduced by the following.

import torch
from torch import nn
from torch.utils.data import DataLoader, TensorDataset, random_split

inputs = torch.randn(999, 11, dtype=torch.float32)
targets = torch.randint(5, (999, 1), dtype=torch.long)

# you need this to adapt from pandas, but not for this example code
# inputs = torch.tensor(inputs, dtype=torch.float32)
# targets = torch.tensor(targets)

dataset = TensorDataset(inputs, targets)

val_size = 200
test_size = 100
train_size = len(dataset) - (val_size + test_size)

# Divide dataset into 3 unique random subsets
training_data, validation_data, test_data = random_split(dataset, [train_size, val_size, test_size])

batch_size = 50

train_loader = DataLoader(training_data, batch_size, shuffle=True, num_workers=4, pin_memory=True)
valid_loader = DataLoader(validation_data, batch_size*2, num_workers=4, pin_memory=True)

# guess model. More on this in a moment
model = nn.Sequential(
    nn.Linear(11, 8),
    nn.Linear(8, 5),
)

loss_func = nn.CrossEntropyLoss()

for features, labels in train_loader:
    out = model(features)
    loss = loss_func(out, labels)
    print(f"{loss = }")
    break

Solution 1

Add labels.squeeze(-1) to the loop body a la

for features, labels in train_loader:
    out = model(features)
    labels = labels.squeeze()
    loss = loss_func(out, labels)
    print(f"{loss = }")
    break

Solution 2

Flatten your targets initially with

targets = torch.tensor(targets[:, 0])

In response to

Now I am getting this error: RuntimeError: mat1 and mat2 shapes cannot be multiplied (11x1 and 11x8) I should also add that I am using a hidden layer of size 8 and i have 5 classes

My architecture is a guess at what you're using, but as the code above is resolved by the target reshape, I'll need more to be more helpful.

Perhaps some documentation to assist? CrossEntropyLoss The example code shows the expected shape of the targets being N , rather than N, 1 or N, classes .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM