简体   繁体   English

每50,000行分割并在python中写入文件

[英]split every 50,000 lines and write file in python

I read files which contains 75,151 lines. 我读取了包含75,151行的文件。 I want to split every 50,000 lines. 我想每50,000行分割一次。 So I made 2 files, one files have 50,000 lines and the other file has 25,151 lines. 所以我做了2个文件,一个文件有50,000行,另一个文件有25,151行。

I made code like this (INSERT_NUMBER : 50,000) 我做了这样的代码(INSERT_NUMBER:50,000)

 for index, data in enumerate(lines):
   if ((index % INSERT_NUMBER) == 0 and index != 0) or (index == (lines- 1)) : 
           made file ....

which is the better way to split every 50,000 lines and make new files ? 哪一种是每50,000行分割并制作新文件的更好方法?

Here's one way using itertools.groupby() : 这是使用itertools.groupby()的一种方法:

from itertools import groupby

out_filename = '/tmp/f{}.txt'
lines_per_file = 50000

with open('infile.txt') as infile:
    for file_number, lines in groupby(enumerate(infile), key=lambda x: x[0] // lines_per_file):
        with open(out_filename.format(file_number), 'w') as outfile:
            outfile.writelines(line for line_number, line in lines)

So the trick here is to use the line number of each line to group it into chunks using integer division, and then to use the grouping key as a counter for the output file name. 因此,这里的技巧是使用每行的行号使用整数除法将其分组为块,然后将分组键用作输出文件名的计数器。

Is it better than what you already have? 它比您已有的更好吗? It's a bit more complicated to read the code, but it doesn't need to deal with those annoying edge cases when you try to group using modulo arithmetic. 读取代码有点复杂,但是当您尝试使用模算术进行分组时,不需要处理那些烦人的边缘情况。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM