[英]Python: How do I split a .txt file into two or more files with the same number of lines in each?
(我相信我一直在stackexchange和互联网上寻找时间,但找不到正确的答案)
我在这里想要做的是计算文件的行数,我在这里的代码中实现了这一点
# Does not loud into memory
def file_len(fname):
with open(fname) as f:
for i, l in enumerate(f, 1):
pass
print(i)
file_len('bigdata.txt')
然后我将文件的行数除以2/3 / etc(以具有相同行数的2/3 / etc文件)为例,例如bigdata.txt = 1000000行1000000/2 = 500000所以在这里将有两个文件,每个文件各有500000行,一个从1到500000,另一个从500001到1000000。我已经有了这段代码,该代码在原始文件(bigdata.txt)中寻找模式,但我不是寻找任何模式,只想将东西分成两半或更多。 这是它的代码:
# Does not loud into memory
with open('bigdata.txt', 'r') as r:
with open('fhalf', 'w') as f:
for line in r:
if line == 'pattern\n': # Splits the file when there is an occurence of the pattern.
#But the occurence as you may notice won't be included in either the two files which is not a good thing since I need all the data.
break
f.write(line)
with open('shalf.txt', 'w') as f:
for line in r:
f.write(line)
因此,我正在寻找一种简单的解决方案,并且我知道有一个解决方案,只是暂时无法解决。 示例将是:file1.txt,file2.txt,每个数字都用相同的数字给定或取一。 谢谢大家的时间。
使用.readlines()
将所有行读入列表,然后计算需要分配给每个文件的行数,然后开始编写!
num_files = 2
with open('bigdata.txt') as in_file:
lines = in_file.readlines()
lines_per_file = len(lines) // num_files
for n in range(num_files):
with open('file{}.txt'.format(n+1), 'w') as out_file:
for i in range(n * lines_per_file, (n+1) * lines_per_file):
out_file.write(lines[i])
并进行全面测试:
$ cat bigdata.txt
line1
line2
line3
line4
line5
line6
$ python -q
>>> num_files = 2
>>> with open('bigdata.txt') as in_file:
... lines = in_file.readlines()
... lines_per_file = len(lines) // num_files
... for n in range(num_files):
... with open('file{}.txt'.format(n+1), 'w') as out_file:
... for i in range(n * lines_per_file, (n+1) * lines_per_file):
... out_file.write(lines[i])
...
>>>
$ more file*
::::::::::::::
file1.txt
::::::::::::::
line1
line2
line3
::::::::::::::
file2.txt
::::::::::::::
line4
line5
line6
如果您无法将bigdata.txt
读取到内存中,那么.readlines()
解决方案将不会将其删除。
阅读它们时,您将不得不编写这些行,这没什么大不了的。
至于首先确定长度, 此问题讨论了一些方法,我最喜欢的是Kyle的sum()
方法。
num_files = 2
num_lines = sum(1 for line in open('bigdata.txt'))
lines_per_file = num_lines // num_files
with open('bigdata.txt') as in_file:
for n in range(num_files):
with open('file{}.txt'.format(n+1), 'w') as out_file:
for _ in range(lines_per_file):
out_file.write(in_file.readline())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.