如何在PyTorch中平衡（过采样）不平衡数据（使用WeightedRandomSampler）？

Question

I have a 2-class problem and my data is highly unbalanced. 我有2类问题，我的数据高度不平衡。 I have 232550 samples from one class and 13498 from the second class. 我有来自一堂课的232550个样本和来自第二堂课的13498样本。 PyTorch docs and the internet tells me to use the class WeightedRandomSampler for my DataLoader. PyTorch文档和互联网告诉我为我的DataLoader使用类WeightedRandomSampler。

I have tried using the WeightedRandomSampler but I keep getting errors. 我已经尝试过使用WeightedRandomSampler，但是我一直收到错误消息。

    trainratio = np.bincount(trainset.labels)
    classcount = trainratio.tolist()
    train_weights = 1./torch.tensor(classcount, dtype=torch.float)
    train_sampleweights = train_weights[trainset.labels]
    train_sampler = WeightedRandomSampler(weights=train_sampleweights, 
    num_samples = len(train_sampleweights))
    trainloader = DataLoader(trainset, sampler=train_sampler, 
    shuffle=False)

I can not see why I am getting this error when initializing the WeightedRandomSampler class? 初始化WeightedRandomSampler类时，为什么看不到此错误？

I have tried other similar workarounds but so far all attempts produce some error. 我尝试了其他类似的解决方法，但到目前为止，所有尝试均会产生一些错误。 How should I implement this to balance my train, validation and test data? 我应该如何实施以平衡训练，验证和测试数据？

Currently getting this error: 当前出现此错误：

train__sampleweights = train_weights[trainset.labels] ValueError: too many dimensions 'str' train__sampleweights = train_weights [trainset.labels] ValueError：尺寸'str'过多

Answer 1

问题出在trainset.labels的类型中为了解决错误，可以将trainset.labels转换为float

如何在PyTorch中平衡（过采样）不平衡数据（使用WeightedRandomSampler）？

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-01-29 08:05:47

如何在PyTorch中平衡（过采样）不平衡数据（使用WeightedRandomSampler）？

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-01-29 08:05:47

解决方案1
0 已采纳 2019-01-29 08:05:47