如何在不运行 for 循环的情况下使 class 的 iter 方法返回一个值？

Question

我有一个 class，它有一个__iter__方法，如下所示

class Mycorpus:
    
    '''This class helps us to train the model without loading the whole dataset to the RAM.'''
    
    def __init__(self, filepath= text_file):
        self.filepath = filepath
        
    def __iter__(self):
        with open(self.filepath,'r') as rfile:
            csv_reader = csv.DictReader(rfile, delimiter=',')
            for row in csv_reader:
        
                # splitter splits the conversation into client and agent part
                client_convo, agent_convo = convo_split.splitter(row['Combined'])

                client_tokens = preprocess(client_convo)
                agent_tokens = preprocess(agent_convo)
                
                yield client_tokens

我将此 object 传递给 function ，这需要此 object 在迭代时一次返回一组令牌。 即， client_tokens或agent_tokens 。 我希望__iter__产生一个client_tokens ，并在下一次迭代中产生来自同一客户端代理对的agent_tokens 。 我不想同时产生两组令牌，因为它会破坏功能。 一次只有一个。 我的主要目标是避免循环两次文件并在相同的对话中再次使用拆分器 function。

我试过做类似下面的事情。

def __init__(self, filepath= text_file):
        self.filepath = filepath
        self.agent_turn = 0

def __iter__(self):
        with open(self.filepath,'r') as rfile:
            csv_reader = csv.DictReader(rfile, delimiter=',')
 
            if self.agent_turn:
                self.agent_turn = 0
                yield agent_tokens
            
            else:
                for row in csv_reader:
                
                    # splitter splits the conversation into client and agent part
                    client_convo, agent_convo = convo_split.splitter(row['Combined'])

                    client_tokens = preprocess(client_convo)
                    agent_tokens = preprocess(agent_convo)
                    self.agent_turn = 1
                    yield client_tokens

但是上面的代码只给出了client_tokens 。 在不使用 memory 的整个数据集的情况下，有没有更好的方法？ 我的要求甚至可以使用__iter__方法吗？ 非常感谢任何帮助或指导。

Answer 1

正如许多示例向您展示的那样，您使用了两个 yield 语句。 请记住，生成器/迭代器在yield语句之后重新进入，而不是在 function 的顶部。

        for row in csv_reader:
    
            # splitter splits the conversation into client and agent part
            client_convo, agent_convo = convo_split.splitter(row['Combined'])

            client_tokens = preprocess(client_convo)
            agent_tokens = preprocess(agent_convo)
            
            yield client_tokens
            yield agent_tokens

如何在不运行 for 循环的情况下使 class 的 iter 方法返回一个值？

问题描述

1 个解决方案

解决方案1
4 已采纳 2021-02-25 18:42:11

如何在不运行 for 循环的情况下使 class 的 iter 方法返回一个值？

问题描述

1 个解决方案

解决方案1 4 已采纳 2021-02-25 18:42:11

解决方案1
4 已采纳 2021-02-25 18:42:11