[英]How to balance (oversampling) unbalanced data in PyTorch (with WeightedRandomSampler)?
I have a 2-class problem and my data is highly unbalanced. 我有2类问题,我的数据高度不平衡。 I have 232550 samples from one class and
13498
from the second class. 我有来自一堂课的232550个样本和来自第二堂课的
13498
样本。 PyTorch docs and the internet tells me to use the class WeightedRandomSampler for my DataLoader. PyTorch文档和互联网告诉我为我的DataLoader使用类WeightedRandomSampler。
I have tried using the WeightedRandomSampler but I keep getting errors. 我已经尝试过使用WeightedRandomSampler,但是我一直收到错误消息。
trainratio = np.bincount(trainset.labels)
classcount = trainratio.tolist()
train_weights = 1./torch.tensor(classcount, dtype=torch.float)
train_sampleweights = train_weights[trainset.labels]
train_sampler = WeightedRandomSampler(weights=train_sampleweights,
num_samples = len(train_sampleweights))
trainloader = DataLoader(trainset, sampler=train_sampler,
shuffle=False)
I can not see why I am getting this error when initializing the WeightedRandomSampler class? 初始化WeightedRandomSampler类时,为什么看不到此错误?
I have tried other similar workarounds but so far all attempts produce some error. 我尝试了其他类似的解决方法,但到目前为止,所有尝试均会产生一些错误。 How should I implement this to balance my train, validation and test data?
我应该如何实施以平衡训练,验证和测试数据?
Currently getting this error: 当前出现此错误:
train__sampleweights = train_weights[trainset.labels] ValueError: too many dimensions 'str'
train__sampleweights = train_weights [trainset.labels] ValueError:尺寸'str'过多
问题出在trainset.labels的类型中为了解决错误,可以将trainset.labels转换为float
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.