My "label" field is a one-hot vector of 201 length. I am however unable to create an iterator with this one-hot representation. I'm getting the below error if I try to iterate over the iterator.
from torchtext.data import Field
from torchtext.data import TabularDataset
from torchtext.data import Iterator, BucketIterator
tokenize = lambda x: x.split()
TEXT = Field(sequential=True, tokenize=tokenize, lower=True)
LABEL = Field(sequential=True, use_vocab=False)
datafields = [("text", TEXT), ("label", LABEL)]
train, test = TabularDataset.splits(
path = '/home/karthik/Documents/Deep_Learning/73Strings/',
train = "train.csv", validation="test.csv",
format='csv',
skip_header=True,
fields=datafields)
train_iter, val_iter = BucketIterator.splits(
(train, test), # we pass in the datasets we want the iterator to draw data from
batch_sizes=(64, 64),
device=device, # if you want to use the GPU, specify the GPU number here
sort_key=lambda x: len(x.text), # the BucketIterator needs to be told what function it should use to group the data.
sort_within_batch=False,
repeat=False # we pass repeat=False because we want to wrap this Iterator layer.
)
test_iter = Iterator(test, batch_size=64, sort=False, sort_within_batch=False, repeat=False)
for batch in train_iter:
print(batch)
ValueError Traceback (most recent call last) in () ----> 1 for batch in train_iter: 2 print(batch)
/usr/local/lib/python3.6/dist-packages/torchtext/data/iterator.py in iter (self) 155 else: 156 minibatch.sort(key=self.sort_key, reverse=True) --> 157 yield Batch(minibatch, self.dataset, self.device) 158 if not self.repeat: 159 return
/usr/local/lib/python3.6/dist-packages/torchtext/data/batch.py in init (self, data, dataset, device) 32 if field is not None: 33 batch = [getattr(x, name) for x in data] ---> 34 setattr(self, name, field.process(batch, device=device)) 35 36 @classmethod
/usr/local/lib/python3.6/dist-packages/torchtext/data/field.py in process(self, batch, device) 199 """ 200 padded = self.pad(batch) --> 201 tensor = self.numericalize(padded, device=device) 202 return tensor 203
/usr/local/lib/python3.6/dist-packages/torchtext/data/field.py in numericalize(self, arr, device) 321 arr = self.postprocessing(arr, None) 322 --> 323 var = torch.tensor(arr, dtype=self.dtype, device=device) 324 325 if self.sequential and not self.batch_first:
ValueError: too many dimensions 'str'
I fixed this problem by changing LABEL = Field(sequential=True, use_vocab=False)
into LABEL = Field(sequential=False, use_vocab=False)
. This worked for me.
sequential – Whether the datatype represents sequential data. If False, no tokenization is applied. Default: True.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.