I have a class, which has got an __iter__
method, which goes like this
class Mycorpus:
'''This class helps us to train the model without loading the whole dataset to the RAM.'''
def __init__(self, filepath= text_file):
self.filepath = filepath
def __iter__(self):
with open(self.filepath,'r') as rfile:
csv_reader = csv.DictReader(rfile, delimiter=',')
for row in csv_reader:
# splitter splits the conversation into client and agent part
client_convo, agent_convo = convo_split.splitter(row['Combined'])
client_tokens = preprocess(client_convo)
agent_tokens = preprocess(agent_convo)
yield client_tokens
I am passing this object to a function which requires this object to return one set of tokens at a time when itered. ie, either a client_tokens
or a agent_tokens
. I want the __iter__
to yield one client_tokens
and on the next iteration the agent_tokens
from the same client agent pair. I don't want to yield two set of tokens together as it will break the functionality. Only one at a time. My main objective here is to avoid looping through the file twice and using splitter function on the same conversations again.
I have tried doing something like below.
def __init__(self, filepath= text_file):
self.filepath = filepath
self.agent_turn = 0
def __iter__(self):
with open(self.filepath,'r') as rfile:
csv_reader = csv.DictReader(rfile, delimiter=',')
if self.agent_turn:
self.agent_turn = 0
yield agent_tokens
else:
for row in csv_reader:
# splitter splits the conversation into client and agent part
client_convo, agent_convo = convo_split.splitter(row['Combined'])
client_tokens = preprocess(client_convo)
agent_tokens = preprocess(agent_convo)
self.agent_turn = 1
yield client_tokens
But the above code is only giving client_tokens
. Is there a better way of doing this without using entire dataset to memory? Is my requirement even possible using __iter__
method? Any help or direction is highly appreciated.
You use two yield statements, just as many examples show you. Remember that a generator / iterator reenters after the yield
statement, not at the top of the function.
for row in csv_reader:
# splitter splits the conversation into client and agent part
client_convo, agent_convo = convo_split.splitter(row['Combined'])
client_tokens = preprocess(client_convo)
agent_tokens = preprocess(agent_convo)
yield client_tokens
yield agent_tokens
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.