Python：如何将.txt文件拆分为两个或多个文件，每个文件中的行数相同？

Question

（我相信我一直在stackexchange和互联网上寻找时间，但找不到正确的答案）

我在这里想要做的是计算文件的行数，我在这里的代码中实现了这一点

# Does not loud into memory
def file_len(fname):
with open(fname) as f:
    for i, l in enumerate(f, 1):
        pass
    print(i)

file_len('bigdata.txt')

然后我将文件的行数除以2/3 / etc（以具有相同行数的2/3 / etc文件）为例，例如bigdata.txt = 1000000行1000000/2 = 500000所以在这里将有两个文件，每个文件各有500000行，一个从1到500000，另一个从500001到1000000。我已经有了这段代码，该代码在原始文件（bigdata.txt）中寻找模式，但我不是寻找任何模式，只想将东西分成两半或更多。 这是它的代码：

# Does not loud into memory
with open('bigdata.txt', 'r') as r:
with open('fhalf', 'w') as f:
    for line in r:
        if line == 'pattern\n': # Splits the file when there is an occurence of the pattern.
#But the occurence as you may notice won't be included in either the two files which is not a good thing since I need all the data.
            break
                f.write(line)
with open('shalf.txt', 'w') as f:
    for line in r:
        f.write(line)

因此，我正在寻找一种简单的解决方案，并且我知道有一个解决方案，只是暂时无法解决。 示例将是：file1.txt，file2.txt，每个数字都用相同的数字给定或取一。 谢谢大家的时间。

Answer 1

使用.readlines()将所有行读入列表，然后计算需要分配给每个文件的行数，然后开始编写！

num_files = 2
with open('bigdata.txt') as in_file:
    lines = in_file.readlines()
    lines_per_file = len(lines) // num_files
    for n in range(num_files):
        with open('file{}.txt'.format(n+1), 'w') as out_file:
            for i in range(n * lines_per_file, (n+1) * lines_per_file):
                out_file.write(lines[i])

并进行全面测试：

$ cat bigdata.txt 
line1
line2
line3
line4
line5
line6
$ python -q
>>> num_files = 2
>>> with open('bigdata.txt') as in_file:
...     lines = in_file.readlines()
...     lines_per_file = len(lines) // num_files
...     for n in range(num_files):
...         with open('file{}.txt'.format(n+1), 'w') as out_file:
...             for i in range(n * lines_per_file, (n+1) * lines_per_file):
...                 out_file.write(lines[i])
... 
>>> 
$ more file*
::::::::::::::
file1.txt
::::::::::::::
line1
line2
line3
::::::::::::::
file2.txt
::::::::::::::
line4
line5
line6

如果您无法将bigdata.txt读取到内存中，那么.readlines()解决方案将不会将其删除。

阅读它们时，您将不得不编写这些行，这没什么大不了的。

至于首先确定长度，此问题讨论了一些方法，我最喜欢的是Kyle的sum()方法。

num_files = 2
num_lines = sum(1 for line in open('bigdata.txt'))
lines_per_file = num_lines // num_files
with open('bigdata.txt') as in_file:
    for n in range(num_files):
        with open('file{}.txt'.format(n+1), 'w') as out_file:
            for _ in range(lines_per_file):
                out_file.write(in_file.readline())

Python：如何将.txt文件拆分为两个或多个文件，每个文件中的行数相同？

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-09-02 13:39:26

Python：如何将.txt文件拆分为两个或多个文件，每个文件中的行数相同？

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-09-02 13:39:26

解决方案1
2 已采纳 2018-09-02 13:39:26