简体   繁体   English

Python:我的脚本不允许我创建大文件

[英]Python: My script will not allow me to create large files

I'm making a small python script which will create random files in all shapes and sizes but it will not let me create large files. 我正在制作一个小型python脚本,它将创建各种形状和大小的随机文件,但不允许我创建大型文件。 I want to be able to create files up to around 8GB in size, I know this would take a long amount of time but I'm not concerned about that. 我希望能够创建最大约8GB的文件,我知道这会花费很长时间,但是我对此并不担心。

The problem is that Python 2.7 will not handle the large numbers I am throwing at it in order to create the random text that will fill my files. 问题在于,Python 2.7无法处理我为了创建将填充文件的随机文本而抛出的大量数字。

The aim of my code is to create files with random names and extentions, fill the files with a random amount of junk text and save the files. 我的代码的目的是创建具有随机名称和扩展名的文件,用随机数量的垃圾文本填充文件并保存文件。 It will keep on repeating this until I close the command line window. 它将继续重复此操作,直到关闭命令行窗口。

import os
import string
import random


ext = ['.zip', '.exe', '.txt', '.pdf', '.msi', '.rar', '.jpg', '.png', '.html', '.iso']

min = raw_input("Enter a minimum file size eg: 112 (meaning 112 bytes): ")
minInt = int(min)

max = raw_input("Enter a maximum file size: ")
maxInt = int(max)

def name_generator(chars=string.ascii_letters + string.digits):
    return ''.join(random.choice(chars) for x in range(random.randint(1,10)))

def text_generator(chars=string.printable + string.whitespace):
    return ''.join(random.choice(chars) for x in range(random.randint(minInt,maxInt)))

def main():
    fileName = name_generator()
    extension = random.choice(ext)
    file = fileName + extension

    print 'Creating ==> ' + file
    fileHandle = open ( file, 'w' )
    fileHandle.write ( text_generator() )
    fileHandle.close()
    print file + ' ==> Was born!'

while 1:
    main()

Any help will be much appreciated! 任何帮助都感激不尽!

Make it lazy, as per the following: 使其变得懒惰,如下所示:

import string
import random
from itertools import islice

chars = string.printable + string.whitespace
# make infinite generator of random chars
random_chars = iter(lambda: random.choice(chars), '')
with open('output_file','w', buffering=102400) as fout:
    fout.writelines(islice(random_chars, 1000000)) # write 'n' many

The problem is not that python cannot handle large numbers. 问题不在于python无法处理大量数字。 It can. 它可以。

However, you try to put the whole file contents in memory at once - you might not have enough RAM for this and additionally do not want to do this anyway. 但是,您尝试一次将整个文件内容放入内存中-您可能没有足够的RAM,并且也不想这样做。

The solution is using a generator and writing the data in chunks: 解决方案是使用生成器并将数据分块写入:

def text_generator(chars=string.printable + string.whitespace):
    return (random.choice(chars) for x in range(random.randint(minInt,maxInt))

for char in text_generator():
    fileHandle.write(char)

This is still horribly inefficient though - you want to write your data in blocks of eg 10kb instead of single bytes. 但是,这仍然非常低效-您想以10kb的块而不是单个字节的形式写入数据。

A comment about performance: you could improve it by using os.urandom() to generates random bytes and str.translate() to translate them into the range of input characters: 关于性能的评论:您可以通过使用os.urandom()生成随机字节,并使用str.translate()将其转换为输入字符范围来改善它:

import os
import string

def generate_text(size, chars=string.printable+string.whitespace):
    # make translation table from 0..255 to chars[0..len(chars)-1]
    all_chars = string.maketrans('', '')
    assert 0 < len(chars) <= len(all_chars)
    result_chars = ''.join(chars[b % len(chars)] for b in range(len(all_chars)))

    # generate `size` random bytes and translate them into given `chars`
    return os.urandom(size).translate(string.maketrans(all_chars, result_chars))

Example: 例:

with open('output.txt', 'wb') as outfile: # use binary mode
    chunksize = 1 << 20  # 1MB
    N = 8 * (1 << 10)    # (N * chunksize) == 8GB
    for _ in xrange(N):
        outfile.write(generate_text(chunksize))

Note: to avoid skewing the random distribution, bytes larger than k*len(chars)-1 returned by os.urandom() should be discarded, where k*len(chars) <= 256 < (k+1)*len(chars) . 注意:为避免歪曲随机分布,应丢弃os.urandom()返回的大于k*len(chars)-1os.urandom() ,其中k*len(chars) <= 256 < (k+1)*len(chars)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM