繁体   English   中英

读取文件行并根据它们的长度合并它们

[英]Read file lines and merge them based on their length

编辑这是一个文本文件: https ://www.gutenberg.org/files/9830/9830-0.txt

我有一个文件test_file.txt ,它由各种长度大小(字数)的行组成。 我想加载每一行; 检查它的长度,如果长度大于或等于>=最小阈值(比如 20 个字),那么我将该行附加到名为 container: container = []的列表中。 否则,我将不得不加载另一行,并将其与当前行合并,直到达到所需的长度大小,然后将结果行合并附加到列表container 我必须对文件中的所有行都这样做。

这是我的代码,它一直工作到最后两行,它会忽略它们。

# Creating a generator to load file lines, one by one:

def gen_file_reader(file_path):
    with open(file_path, encoding='utf-8') as file:
        for line in file.readlines():
            yield line

container = [] # List that will contain the results
lines = gen_file_reader('test_file.txt') # Calling the generator function


x = ""
for line in lines:
    while len(x.split()) < 20:
        x = x + line
        break
    else:
        container.append(x)
        x = ""
        container.append(line)

我注意到我的代码不适用于文件中的最后两行,可能是因为 while 语句中的break关键字......可能还有其他我不知道的错误!

编辑:示例文件的最终结果(假设我们去掉空白和空行),对于列表container中的前 4 个项目,如下所示:

["Project Gutenberg's The Beautiful and Damned, by F. Scott Fitzgerald This eBook is for the use of anyone anywhere at no cost and with",
 'almost no restrictions whatsoever.  You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included',
 'with this eBook or online at www.gutenberg.org Title: The Beautiful and Damned Author: F. Scott Fitzgerald Release Date: October 22, 2003 [EBook #9830]',
 'Last updated: January 29, 2020 Language: English Character set encoding: UTF-8 *** START OF THIS PROJECT GUTENBERG EBOOK THE BEAUTIFUL AND DAMNED ***']

在您的逻辑中,如果您的尾随行不能连接到包含 20 个或更多单词,则不会将其添加到容器中而且我认为最好直接在生成器中执行合并逻辑

def gen_file_reader(file_path):
    with open(file_path, encoding='utf-8') as file:
        for line in file:
            try:
                while len(line.split()) < 20:
                    line += next(file)
                yield line
            except StopIteration:
                yield line


lines = gen_file_reader('test_file.txt')  # Calling the generator function
print(list(lines))

附上我的 test_file.txt

my name is Cn-LanBao my name is Cn-LanBao my name is Cn-LanBao
how are you my name is Cn-LanBao my name is Cn-LanBao my name is Cn-LanBao my name is Cn-LanBao
my name is Cn-LanBao my name is Cn-LanBao
my name is Cn-LanBao my name is Cn-LanBao
my name is how are you
my name is how are you
my name is Cn-LanBao
my name is how are you

和输出

['my name is Cn-LanBao my name is Cn-LanBao my name is Cn-LanBao\nhow are you my name is Cn-LanBao my name is Cn-LanBao my name is Cn-LanBao my name is Cn-LanBao\n', 'my name is Cn-LanBao my name is Cn-LanBao\nmy name is Cn-LanBao my name is Cn-LanBao\nmy name is how are you\n', 'my name is how are you\nmy name is Cn-LanBao\nmy name is how are you\n']

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM