How to balance (oversampling) unbalanced data in PyTorch (with WeightedRandomSampler)?

Question

I have a 2-class problem and my data is highly unbalanced. I have 232550 samples from one class and 13498 from the second class. PyTorch docs and the internet tells me to use the class WeightedRandomSampler for my DataLoader.

I have tried using the WeightedRandomSampler but I keep getting errors.

    trainratio = np.bincount(trainset.labels)
    classcount = trainratio.tolist()
    train_weights = 1./torch.tensor(classcount, dtype=torch.float)
    train_sampleweights = train_weights[trainset.labels]
    train_sampler = WeightedRandomSampler(weights=train_sampleweights, 
    num_samples = len(train_sampleweights))
    trainloader = DataLoader(trainset, sampler=train_sampler, 
    shuffle=False)

I can not see why I am getting this error when initializing the WeightedRandomSampler class?

I have tried other similar workarounds but so far all attempts produce some error. How should I implement this to balance my train, validation and test data?

Currently getting this error:

train__sampleweights = train_weights[trainset.labels] ValueError: too many dimensions 'str'

Answer 1

问题出在trainset.labels的类型中为了解决错误，可以将trainset.labels转换为float

How to balance (oversampling) unbalanced data in PyTorch (with WeightedRandomSampler)?

Question

1 answers

solution1
0 ACCPTED 2019-01-29 08:05:47

How to balance (oversampling) unbalanced data in PyTorch (with WeightedRandomSampler)?

Question

1 answers

solution1 0 ACCPTED 2019-01-29 08:05:47

solution1
0 ACCPTED 2019-01-29 08:05:47