繁体   English   中英

Python:如何将.txt文件拆分为两个或多个文件,每个文件中的行数相同?

[英]Python: How do I split a .txt file into two or more files with the same number of lines in each?

(我相信我一直在stackexchange和互联网上寻找时间,但找不到正确的答案)

我在这里想要做的是计算文件的行数,我在这里的代码中实现了这一点

# Does not loud into memory
def file_len(fname):
with open(fname) as f:
    for i, l in enumerate(f, 1):
        pass
    print(i)

file_len('bigdata.txt')

然后我将文件的行数除以2/3 / etc(以具有相同行数的2/3 / etc文件)为例,例如bigdata.txt = 1000000行1000000/2 = 500000所以在这里将有两个文件,每个文件各有500000行,一个从1到500000,另一个从500001到1000000。我已经有了这段代码,该代码在原始文件(bigdata.txt)中寻找模式,但我不是寻找任何模式,只想将东西分成两半或更多。 这是它的代码:

# Does not loud into memory
with open('bigdata.txt', 'r') as r:
with open('fhalf', 'w') as f:
    for line in r:
        if line == 'pattern\n': # Splits the file when there is an occurence of the pattern.
#But the occurence as you may notice won't be included in either the two files which is not a good thing since I need all the data.
            break
                f.write(line)
with open('shalf.txt', 'w') as f:
    for line in r:
        f.write(line)

因此,我正在寻找一种简单的解决方案,并且我知道有一个解决方案,只是暂时无法解决。 示例将是:file1.txt,file2.txt,每个数字都用相同的数字给定或取一。 谢谢大家的时间。

使用.readlines()将所有行读入列表,然后计算需要分配给每个文件的行数,然后开始编写!

num_files = 2
with open('bigdata.txt') as in_file:
    lines = in_file.readlines()
    lines_per_file = len(lines) // num_files
    for n in range(num_files):
        with open('file{}.txt'.format(n+1), 'w') as out_file:
            for i in range(n * lines_per_file, (n+1) * lines_per_file):
                out_file.write(lines[i])

并进行全面测试:

$ cat bigdata.txt 
line1
line2
line3
line4
line5
line6
$ python -q
>>> num_files = 2
>>> with open('bigdata.txt') as in_file:
...     lines = in_file.readlines()
...     lines_per_file = len(lines) // num_files
...     for n in range(num_files):
...         with open('file{}.txt'.format(n+1), 'w') as out_file:
...             for i in range(n * lines_per_file, (n+1) * lines_per_file):
...                 out_file.write(lines[i])
... 
>>> 
$ more file*
::::::::::::::
file1.txt
::::::::::::::
line1
line2
line3
::::::::::::::
file2.txt
::::::::::::::
line4
line5
line6

如果您无法将bigdata.txt读取到内存中,那么.readlines()解决方案将不会将其删除。

阅读它们时,您将不得不编写这些行,这没什么大不了的。

至于首先确定长度, 此问题讨论了一些方法,我最喜欢的是Kyle的sum()方法。

num_files = 2
num_lines = sum(1 for line in open('bigdata.txt'))
lines_per_file = num_lines // num_files
with open('bigdata.txt') as in_file:
    for n in range(num_files):
        with open('file{}.txt'.format(n+1), 'w') as out_file:
            for _ in range(lines_per_file):
                out_file.write(in_file.readline())

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM