简体   繁体   English

拆分大文件时为空块

[英]Empty chunks when spliting a large file

I am trying to split a large files into 50Mb chunks and save them in another files. 我正在尝试将一个大文件分成50Mb的块并将其保存在另一个文件中。 After running some read/write operations, some of my chunks were smaller than 50Mb (43Mb,17Mb and so on). 运行一些读/写操作后,我的一些块小于50Mb(43Mb,17Mb等)。 Although, I wrote the same code in Java and It has the same problem. 虽然,我用Java编写了相同的代码,但存在相同的问题。 What is wrong? 怎么了? my codes are following bellow: 我的代码如下:

By the way, What we can do to speed up this code to split into chunks faster? 顺便说一句,我们可以做些什么来加快此代码的速度以更快地拆分成多个块?

try:
    f = open(self.__filename, 'rb')
except (OSError, IOError), e:
    raise FileSplitterException, str(e)

bname = (os.path.split(self.__filename))[1]

fsize = os.path.getsize(self.__filename)

self.__chunksize = int(float(fsize)/float(self.__numchunks))

chunksz = self.__chunksize
total_bytes = 0

for x in range(self.__numchunks):
    chunkfilename = bname + '-' + str(x+1) + self.__postfix

    if x == self.__numchunks - 1:
        chunksz = fsize - total_bytes

    try:
        print 'Writing file',chunkfilename
        data = f.read(chunksz)
        total_bytes += len(data)
        chunkf = file(chunkfilename, 'wb')
        chunkf.write(data)
        chunkf.close()
    except (OSError, IOError), e:
        print e
        continue
    except EOFError, e:
        print e
        break

The code in the question seems to be focussed on producing a set number of chunks rather than files of 50MB in size. 问题中的代码似乎集中于生成一定数量的块,而不是大小为50MB的文件。

This code produces 50MB files. 此代码产生50MB的文件。

import os


try:
    f = open('big.txt', 'rb')
except (OSError, IOError), e:
    raise FileSplitterException, str(e)

bname = (os.path.split('big.txt'))[1]

chunksz = 50 * 1000 * 1000 # metric MB - use 1024 * 1024 for binary MB (MiB)

counter = 0

while True:
    chunkfilename = bname + '-' + str(counter+1) + '.foo'

    try:
        print 'Writing file',chunkfilename
        data = f.read(chunksz)
        if not data:
            # We have reached the end of the file, end the script.
            break
        chunkf = file(chunkfilename, 'wb')
        chunkf.write(data)
        chunkf.close()
    except (OSError, IOError), e:
        print e
        continue
    except EOFError, e:
        print e
        break
    counter += 1

Some aspects of the code are considered poor style in modern python - for example not using a context manager to open files - but I haven't changed these in case the OP is on an old python like 2.5. 在现代python中,代码的某些方面被认为是较差的样式-例如,未使用上下文管理器打开文件-但在OP使用2.5等旧python的情况下,我没有更改这些内容。

Your question is unclear because you haven't included a Minimal, Complete, and Verifiable example —so I don't know exactly what's wrong with your code. 您的问题尚不清楚,因为您没有提供最小,完整和可验证的示例 ,因此我不知道您的代码到底有什么问题。 However after creating / simulating my guess as to the missing parts, I was able to come up with something that does exactly what you want, I think. 但是,在创建/模拟了我对缺失部分的猜测之后,我想出了完全符合您想要的功能的东西。

import os

class FileSplitterException(Exception): pass

class FileSplitter(object):
    def __init__(self, filename, chunksize):
        if not os.path.isfile(filename):
            raise FileSplitterException(
                "File: {!r} does not exist".format(filename))
        self._filename = filename
        self._postfix = 'chunk'
        self._chunksize = chunksize

    def split(self):
        bname = os.path.splitext(self._filename)[0]
        fsize = os.path.getsize(self._filename)
        chunks, partial = divmod(fsize, self._chunksize)
        if partial:
            chunks += 1

        with open(self._filename, 'rb') as infile:
            for i in range(chunks):
                chunk_filename = os.path.join('{}-{}.{}'.format(
                                                bname, i, self._postfix))
                with open(chunk_filename, 'wb') as outfile:
                    data = infile.read(self._chunksize)
                    if data:
                        outfile.write(data)
                    else:
                        FileSplitterException('unexpected EOF encountered')

if __name__ == '__main__':
    import glob

    filename = 'big_file.txt'
    chunksize = 1 * 1024 * 1024  # 1 Mb

    print('splitting {} into {:,} sized chunks'.format(filename, chunksize))

    fs = FileSplitter(filename, chunksize)
    fs.split()

    print('chunk files written:')
    bname = os.path.splitext(filename)[0]
    for chunkname in sorted(glob.glob(bname + '-*.' + fs._postfix)):
        fsize = os.path.getsize(chunkname)
        print('  {}: size: {:,}'.format(chunkname, fsize))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM