简体   繁体   English

Pytorch 文本 AttributeError: 'BucketIterator' 对象没有属性

[英]Pytorch Text AttributeError: ‘BucketIterator’ object has no attribute

I'm doing seq2seq machine translation on my own dataset.我正在对自己的数据集进行 seq2seq 机器翻译。 I have preproceed my dataset using this code.我已经使用此代码预先处理了我的数据集。

The problem comes when i tried to split train_data using BucketIterator.split()当我尝试使用 BucketIterator.split() 拆分 train_data 时出现问题

def tokenize_word(text):
  return nltk.word_tokenize(text)

id = Field(sequential=True, tokenize = tokenize_word, lower=True, init_token="<sos>", eos_token="<eos>")
ti = Field(sequential=True, tokenize = tokenize_word, lower=True, init_token="<sos>", eos_token="<eos>")

fields = {'id': ('i', id), 'ti': ('t', ti)}

train_data = TabularDataset.splits(
    path='/content/drive/MyDrive/Colab Notebooks/Tidore/',
    train = 'id_ti.tsv',
    format='tsv',
    fields=fields
)[0]

id.build_vocab(train_data)
ti.build_vocab(train_data)

print(f"Unique tokens in source (id) vocabulary: {len(id.vocab)}")
print(f"Unique tokens in target (ti) vocabulary: {len(ti.vocab)}")

train_iterator = BucketIterator.splits(
    (train_data),
    batch_size = batch_size,
    sort_within_batch = True,
    sort_key = lambda x: len(x.id),
    device = device
)

print(len(train_iterator))

for data in train_iterator:
  print(data.i)

This is the result of the code above这是上面代码的结果

Unique tokens in source (id) vocabulary: 1425
Unique tokens in target (ti) vocabulary: 1297
2004

---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

<ipython-input-72-e73a211df4bd> in <module>()
     31 
     32 for data in train_iterator:
---> 33   print(data.i)

AttributeError: 'BucketIterator' object has no attribute 'i'

This is the result when i tried to print the train_iterator这是我尝试打印 train_iterator 时的结果

I am very confuse, because i don't know what key i should use for train iterator.我很困惑,因为我不知道应该使用什么键来训练迭代器。 Thank you for your help谢谢您的帮助

According to torchtext documents , it's better to use TranslationDataset to do what is desired!根据torchtext 文档,最好使用TranslationDataset来做所需的事情! but if for some reason you prefer to use TabularDataset its better to do it like:但是如果由于某种原因你更喜欢使用TabularDataset最好这样做:

import nltk
print(nltk.__version__)
from torchtext import data
import torchtext
print(torchtext.__version__)
def tokenize_word(text):
    return nltk.word_tokenize(text)

batch_size = 5

SRC = Field(sequential=True, tokenize = tokenize_word, lower=True, init_token="<sos>", eos_token="<eos>")
TRG = Field(sequential=True, tokenize = tokenize_word, lower=True, init_token="<sos>", eos_token="<eos>")

train = data.TabularDataset.splits(
    path='./data/', train='tr.tsv', format='tsv',
    fields=[('src', SRC), ('trg', TRG)])[0]

SRC.build_vocab(train)
TRG.build_vocab(train)

train_iter = data.BucketIterator(
    train, batch_size=batch_size,
    sort_key=lambda x: len(x.text), device=0)

for item in train_iter:
    print(item.trg)

Output:输出:

3.6.2
0.6.0
tensor([[2, 2, 2, 2, 2],
        [5, 5, 5, 5, 5],
        [4, 4, 4, 4, 4],
        [6, 6, 6, 6, 6],
        [7, 7, 7, 7, 7],
        [3, 3, 3, 3, 3]])
tensor([[2, 2, 2, 2, 2],
        [5, 5, 5, 5, 5],
        [4, 4, 4, 4, 4],
        [6, 6, 6, 6, 6],
        [7, 7, 7, 7, 7],
        [3, 3, 3, 3, 3]])

NOTE: make sure there is tr.tsv file contains text columns separated by tab, in data directory.注意:确保在数据目录中有 tr.tsv 文件包含由制表符分隔的文本列。 Welcome to stackoverflow & hope it helps :)欢迎使用 stackoverflow 并希望它有所帮助:)

 train_iterator = BucketIterator.splits(
(train_data),
batch_size = batch_size,
sort_within_batch = True,
sort_key = lambda x: len(x.id),
device = device

) )

here Use BucketIterator instead of BucketIterator.splits when there is only one iterator needs to be generated.这里当只需要生成一个迭代器时,使用BucketIterator而不是BucketIterator.splits

I have met this problem and the method mentioned above works.我遇到了这个问题,上面提到的方法有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 BucketIterator 抛出“Field”对象没有属性“vocab” - BucketIterator throws 'Field' object has no attribute 'vocab' PyTorch - AttributeError: 'bool' object 没有属性 'sum' - PyTorch - AttributeError: 'bool' object has no attribute 'sum' Pytorch:AttributeError:&#39;function&#39;对象没有属性&#39;cuda&#39; - Pytorch : AttributeError: 'function' object has no attribute 'cuda' AttributeError: 'str' object 在 pytorch 中没有属性 'dim' - AttributeError: 'str' object has no attribute 'dim' in pytorch AttributeError: 'list' object 没有属性 'split' -PyTorch - AttributeError: 'list' object has no attribute 'split' -PyTorch Pytorch: AttributeError: 'function' object 没有属性 'copy' - Pytorch: AttributeError: 'function' object has no attribute 'copy' AttributeError:“撰写”object 没有属性“撰写”(在 Pytorch 0.2.1 中) - AttributeError: 'Compose' object has no attribute 'Compose' (in Pytorch 0.2.1) pytorch+tensorboard 错误“AttributeError: 'Tensor' object has no attribute 'items'” - pytorch+tensorboard error “ AttributeError: 'Tensor' object has no attribute 'items' ” AttributeError: 'KMeans' object 没有属性 'labels_' pytorch - AttributeError: 'KMeans' object has no attribute 'labels_' pytorch AttributeError:“列表”对象没有属性“文本” - AttributeError: 'list' object has no attribute 'text'
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM