简体   繁体   English

Python:如何将.txt文件拆分为两个或多个文件,每个文件中的行数相同?

[英]Python: How do I split a .txt file into two or more files with the same number of lines in each?

(I believe I have been looking for hours on stackexchange's and the internet, but couldn't find the right answer) (我相信我一直在stackexchange和互联网上寻找时间,但找不到正确的答案)

What I'm trying to do here is to count the number of lines a file has, I achieved that with this code here 我在这里想要做的是计算文件的行数,我在这里的代码中实现了这一点

# Does not loud into memory
def file_len(fname):
with open(fname) as f:
    for i, l in enumerate(f, 1):
        pass
    print(i)

file_len('bigdata.txt')

then I take the number of lines of the file and divide it by two/three/etc (to make two/three/etc files with the same amount of lines) eg bigdata.txt = 1000000 lines 1000000/2=500000 So here I will have two files with a 500000 lines in each, one starting from 1 to 500000 & the other from 500001 to 1000000. I already have this code which looks for a pattern in the original file(bigdata.txt), but I'm not looking for any pattern, just want to split the thing into two halfs or whatsover. 然后我将文件的行数除以2/3 / etc(以具有相同行数的2/3 / etc文件)为例,例如bigdata.txt = 1000000行1000000/2 = 500000所以在这里将有两个文件,每个文件各有500000行,一个从1到500000,另一个从500001到1000000。我已经有了这段代码,该代码在原始文件(bigdata.txt)中寻找模式,但我不是寻找任何模式,只想将东西分成两半或更多。 Here is the code for it: 这是它的代码:

# Does not loud into memory
with open('bigdata.txt', 'r') as r:
with open('fhalf', 'w') as f:
    for line in r:
        if line == 'pattern\n': # Splits the file when there is an occurence of the pattern.
#But the occurence as you may notice won't be included in either the two files which is not a good thing since I need all the data.
            break
                f.write(line)
with open('shalf.txt', 'w') as f:
    for line in r:
        f.write(line)

So I'm looking for a simple solution and I know there is one, just can't figure it out for this moment. 因此,我正在寻找一种简单的解决方案,并且我知道有一个解决方案,只是暂时无法解决。 sample would be: file1.txt , file2.txt each with the same number lines give or take one. 示例将是:file1.txt,file2.txt,每个数字都用相同的数字给定或取一。 Thank you all for your time. 谢谢大家的时间。

Read in all the lines to a list with .readlines() and then calculate how many lines need to be distributed to each file and then get writing! 使用.readlines()将所有行读入列表,然后计算需要分配给每个文件的行数,然后开始编写!

num_files = 2
with open('bigdata.txt') as in_file:
    lines = in_file.readlines()
    lines_per_file = len(lines) // num_files
    for n in range(num_files):
        with open('file{}.txt'.format(n+1), 'w') as out_file:
            for i in range(n * lines_per_file, (n+1) * lines_per_file):
                out_file.write(lines[i])

And a full test: 并进行全面测试:

$ cat bigdata.txt 
line1
line2
line3
line4
line5
line6
$ python -q
>>> num_files = 2
>>> with open('bigdata.txt') as in_file:
...     lines = in_file.readlines()
...     lines_per_file = len(lines) // num_files
...     for n in range(num_files):
...         with open('file{}.txt'.format(n+1), 'w') as out_file:
...             for i in range(n * lines_per_file, (n+1) * lines_per_file):
...                 out_file.write(lines[i])
... 
>>> 
$ more file*
::::::::::::::
file1.txt
::::::::::::::
line1
line2
line3
::::::::::::::
file2.txt
::::::::::::::
line4
line5
line6

If you can't read bigdata.txt into memory then the .readlines() solution won't cut it. 如果您无法将bigdata.txt读取到内存中,那么.readlines()解决方案将不会将其删除。

You will have to write the lines as you read them which is no big deal. 阅读它们时,您将不得不编写这些行,这没什么大不了的。

As for working out the length in the first place, this question discusses some methods, my favourite being Kyle's sum() method. 至于首先确定长度, 此问题讨论了一些方法,我最喜欢的是Kyle的sum()方法。

num_files = 2
num_lines = sum(1 for line in open('bigdata.txt'))
lines_per_file = num_lines // num_files
with open('bigdata.txt') as in_file:
    for n in range(num_files):
        with open('file{}.txt'.format(n+1), 'w') as out_file:
            for _ in range(lines_per_file):
                out_file.write(in_file.readline())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM