[英]Empty chunks when spliting a large file
I am trying to split a large files into 50Mb chunks and save them in another files. 我正在尝试将一个大文件分成50Mb的块并将其保存在另一个文件中。 After running some read/write operations, some of my chunks were smaller than 50Mb (43Mb,17Mb and so on).
运行一些读/写操作后,我的一些块小于50Mb(43Mb,17Mb等)。 Although, I wrote the same code in Java and It has the same problem.
虽然,我用Java编写了相同的代码,但存在相同的问题。 What is wrong?
怎么了? my codes are following bellow:
我的代码如下:
By the way, What we can do to speed up this code to split into chunks faster? 顺便说一句,我们可以做些什么来加快此代码的速度以更快地拆分成多个块?
try:
f = open(self.__filename, 'rb')
except (OSError, IOError), e:
raise FileSplitterException, str(e)
bname = (os.path.split(self.__filename))[1]
fsize = os.path.getsize(self.__filename)
self.__chunksize = int(float(fsize)/float(self.__numchunks))
chunksz = self.__chunksize
total_bytes = 0
for x in range(self.__numchunks):
chunkfilename = bname + '-' + str(x+1) + self.__postfix
if x == self.__numchunks - 1:
chunksz = fsize - total_bytes
try:
print 'Writing file',chunkfilename
data = f.read(chunksz)
total_bytes += len(data)
chunkf = file(chunkfilename, 'wb')
chunkf.write(data)
chunkf.close()
except (OSError, IOError), e:
print e
continue
except EOFError, e:
print e
break
The code in the question seems to be focussed on producing a set number of chunks rather than files of 50MB in size. 问题中的代码似乎集中于生成一定数量的块,而不是大小为50MB的文件。
This code produces 50MB files. 此代码产生50MB的文件。
import os
try:
f = open('big.txt', 'rb')
except (OSError, IOError), e:
raise FileSplitterException, str(e)
bname = (os.path.split('big.txt'))[1]
chunksz = 50 * 1000 * 1000 # metric MB - use 1024 * 1024 for binary MB (MiB)
counter = 0
while True:
chunkfilename = bname + '-' + str(counter+1) + '.foo'
try:
print 'Writing file',chunkfilename
data = f.read(chunksz)
if not data:
# We have reached the end of the file, end the script.
break
chunkf = file(chunkfilename, 'wb')
chunkf.write(data)
chunkf.close()
except (OSError, IOError), e:
print e
continue
except EOFError, e:
print e
break
counter += 1
Some aspects of the code are considered poor style in modern python - for example not using a context manager to open files - but I haven't changed these in case the OP is on an old python like 2.5. 在现代python中,代码的某些方面被认为是较差的样式-例如,未使用上下文管理器打开文件-但在OP使用2.5等旧python的情况下,我没有更改这些内容。
Your question is unclear because you haven't included a Minimal, Complete, and Verifiable example —so I don't know exactly what's wrong with your code. 您的问题尚不清楚,因为您没有提供最小,完整和可验证的示例 ,因此我不知道您的代码到底有什么问题。 However after creating / simulating my guess as to the missing parts, I was able to come up with something that does exactly what you want, I think.
但是,在创建/模拟了我对缺失部分的猜测之后,我想出了完全符合您想要的功能的东西。
import os
class FileSplitterException(Exception): pass
class FileSplitter(object):
def __init__(self, filename, chunksize):
if not os.path.isfile(filename):
raise FileSplitterException(
"File: {!r} does not exist".format(filename))
self._filename = filename
self._postfix = 'chunk'
self._chunksize = chunksize
def split(self):
bname = os.path.splitext(self._filename)[0]
fsize = os.path.getsize(self._filename)
chunks, partial = divmod(fsize, self._chunksize)
if partial:
chunks += 1
with open(self._filename, 'rb') as infile:
for i in range(chunks):
chunk_filename = os.path.join('{}-{}.{}'.format(
bname, i, self._postfix))
with open(chunk_filename, 'wb') as outfile:
data = infile.read(self._chunksize)
if data:
outfile.write(data)
else:
FileSplitterException('unexpected EOF encountered')
if __name__ == '__main__':
import glob
filename = 'big_file.txt'
chunksize = 1 * 1024 * 1024 # 1 Mb
print('splitting {} into {:,} sized chunks'.format(filename, chunksize))
fs = FileSplitter(filename, chunksize)
fs.split()
print('chunk files written:')
bname = os.path.splitext(filename)[0]
for chunkname in sorted(glob.glob(bname + '-*.' + fs._postfix)):
fsize = os.path.getsize(chunkname)
print(' {}: size: {:,}'.format(chunkname, fsize))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.