[英]How to make iter method of a class return a value without running the for loop?
I have a class, which has got an __iter__
method, which goes like this我有一个 class,它有一个__iter__
方法,如下所示
class Mycorpus:
'''This class helps us to train the model without loading the whole dataset to the RAM.'''
def __init__(self, filepath= text_file):
self.filepath = filepath
def __iter__(self):
with open(self.filepath,'r') as rfile:
csv_reader = csv.DictReader(rfile, delimiter=',')
for row in csv_reader:
# splitter splits the conversation into client and agent part
client_convo, agent_convo = convo_split.splitter(row['Combined'])
client_tokens = preprocess(client_convo)
agent_tokens = preprocess(agent_convo)
yield client_tokens
I am passing this object to a function which requires this object to return one set of tokens at a time when itered.我将此 object 传递给 function ,这需要此 object 在迭代时一次返回一组令牌。 ie, either a client_tokens
or a agent_tokens
.即, client_tokens
或agent_tokens
。 I want the __iter__
to yield one client_tokens
and on the next iteration the agent_tokens
from the same client agent pair.我希望__iter__
产生一个client_tokens
,并在下一次迭代中产生来自同一客户端代理对的agent_tokens
。 I don't want to yield two set of tokens together as it will break the functionality.我不想同时产生两组令牌,因为它会破坏功能。 Only one at a time.一次只有一个。 My main objective here is to avoid looping through the file twice and using splitter function on the same conversations again.我的主要目标是避免循环两次文件并在相同的对话中再次使用拆分器 function。
I have tried doing something like below.我试过做类似下面的事情。
def __init__(self, filepath= text_file):
self.filepath = filepath
self.agent_turn = 0
def __iter__(self):
with open(self.filepath,'r') as rfile:
csv_reader = csv.DictReader(rfile, delimiter=',')
if self.agent_turn:
self.agent_turn = 0
yield agent_tokens
else:
for row in csv_reader:
# splitter splits the conversation into client and agent part
client_convo, agent_convo = convo_split.splitter(row['Combined'])
client_tokens = preprocess(client_convo)
agent_tokens = preprocess(agent_convo)
self.agent_turn = 1
yield client_tokens
But the above code is only giving client_tokens
.但是上面的代码只给出了client_tokens
。 Is there a better way of doing this without using entire dataset to memory?在不使用 memory 的整个数据集的情况下,有没有更好的方法? Is my requirement even possible using __iter__
method?我的要求甚至可以使用__iter__
方法吗? Any help or direction is highly appreciated.非常感谢任何帮助或指导。
You use two yield statements, just as many examples show you.正如许多示例向您展示的那样,您使用了两个 yield 语句。 Remember that a generator / iterator reenters after the yield
statement, not at the top of the function.请记住,生成器/迭代器在yield
语句之后重新进入,而不是在 function 的顶部。
for row in csv_reader:
# splitter splits the conversation into client and agent part
client_convo, agent_convo = convo_split.splitter(row['Combined'])
client_tokens = preprocess(client_convo)
agent_tokens = preprocess(agent_convo)
yield client_tokens
yield agent_tokens
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.