简体   繁体   中英

How to make iter method of a class return a value without running the for loop?

I have a class, which has got an __iter__ method, which goes like this

class Mycorpus:
    
    '''This class helps us to train the model without loading the whole dataset to the RAM.'''
    
    def __init__(self, filepath= text_file):
        self.filepath = filepath
        
    def __iter__(self):
        with open(self.filepath,'r') as rfile:
            csv_reader = csv.DictReader(rfile, delimiter=',')
            for row in csv_reader:
        
                # splitter splits the conversation into client and agent part
                client_convo, agent_convo = convo_split.splitter(row['Combined'])

                client_tokens = preprocess(client_convo)
                agent_tokens = preprocess(agent_convo)
                
                yield client_tokens

I am passing this object to a function which requires this object to return one set of tokens at a time when itered. ie, either a client_tokens or a agent_tokens . I want the __iter__ to yield one client_tokens and on the next iteration the agent_tokens from the same client agent pair. I don't want to yield two set of tokens together as it will break the functionality. Only one at a time. My main objective here is to avoid looping through the file twice and using splitter function on the same conversations again.

I have tried doing something like below.

def __init__(self, filepath= text_file):
        self.filepath = filepath
        self.agent_turn = 0

def __iter__(self):
        with open(self.filepath,'r') as rfile:
            csv_reader = csv.DictReader(rfile, delimiter=',')
 
            if self.agent_turn:
                self.agent_turn = 0
                yield agent_tokens
            
            else:
                for row in csv_reader:
                
                    # splitter splits the conversation into client and agent part
                    client_convo, agent_convo = convo_split.splitter(row['Combined'])

                    client_tokens = preprocess(client_convo)
                    agent_tokens = preprocess(agent_convo)
                    self.agent_turn = 1
                    yield client_tokens

But the above code is only giving client_tokens . Is there a better way of doing this without using entire dataset to memory? Is my requirement even possible using __iter__ method? Any help or direction is highly appreciated.

You use two yield statements, just as many examples show you. Remember that a generator / iterator reenters after the yield statement, not at the top of the function.

        for row in csv_reader:
    
            # splitter splits the conversation into client and agent part
            client_convo, agent_convo = convo_split.splitter(row['Combined'])

            client_tokens = preprocess(client_convo)
            agent_tokens = preprocess(agent_convo)
            
            yield client_tokens
            yield agent_tokens

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM