简体   繁体   中英

Python 3.2 File I/O. Generating & Spliting a file based on preset size limit. How?

I'm working on some code that generates all permutations of a given set of characters. Although my code works in Python 2.7 it no longer does in Python 3.x due to many changes in Strings. I'd like to adapt and since I'm new to Python, I was hoping you could give me a little push. =)

My question is , as Python generates the word list of your choice, the output file size grows accordingly. I'd like you to show me how can I make this script check for a preset file size and if reached, open a new file and continue writing permutations.

Example:

numeric_lowercase.000001
numeric_lowercase.000002
numeric_lowercase.000003

Remember , I have looked at most examples on the site but they do not work with Python 3.2.

Here's my Python 3.2 working code so far:

import itertools
import subprocess
import os
from string import digits, ascii_lowercase, ascii_uppercase, punctuation

if os.name == 'nt':
    def clear_console():
        subprocess.call("cls", shell=True)
        return
else:
    def clear_console():
        subprocess.call("clear", shell=True)
        return

def generate_phone_numbers(area_code):
    f = open('phones.txt', 'w')
    for i in range(2010000, 9999999):
        f.write(area_code + str(i) + '\n')

def generate_wordlist(lst_chars, min_digit, max_digit, lst_name):
    f = open(lst_name, 'w')
    for curr_length in range(min_digit, max_digit + 1):
        for curr_digit in itertools.product(lst_chars, repeat=curr_length):
            f.write(''.join(curr_digit) + '\n')

print ('')
print ('  wgen - Menu')

choice = 0

while int(choice) not in range(1,6):
    clear_console()
    choice = input('''
  1. Phone numbers for a given area code.
  2. Numbers.
  3. Numbers + Lowercase.
  4. Numbers + Lowercase + Uppercase.
  5. Numbers + Lowercase + Uppercase + Punctuation.

  Enter Option: ''')

print ('')

choice = int(choice)

if choice == 1:
    area_code = input('''
  Please enter Area Code: ''')
    area_code = str(area_code)
    area_code = area_code.strip()
    if len(area_code) == 3:
        print ('')
        print ('  Generating phone numbers for area code ' + area_code + '.')
        print ('  Please wait...')
        generate_phone_numbers(area_code)

if choice == 2:
    min_digit = input('  What is the minimum size of the word? ')
    min_digit = int(min_digit)
    print ('')
    max_digit = input('  What is the maximum size of the word? ')
    max_digit = int(max_digit)
    chars = digits
    lst_name = 'numeric.txt'
    print ('')
    print ('  Generating numbers between ' + str(min_digit) + ' and ' + str(max_digit) + ' digits.')
    print ('  Please wait...')
    generate_wordlist(chars, min_digit, max_digit, lst_name)

if choice == 3:
    min_digit = input('  What is the minimum size of the word? ')
    min_digit = int(min_digit)
    print ('')
    max_digit = input('  What is the maximum size of the word? ')
    max_digit = int(max_digit)
    chars = digits + ascii_lowercase
    lst_name = 'numeric_lowercase.txt'
    print ('')
    print ('  Generating numbers & lowercase between ' + str(min_digit) + ' and ' + str(max_digit) + ' digits.')
    print ('  Please wait...')
    generate_wordlist(chars, min_digit, max_digit, lst_name)

if choice == 4:
    min_digit = input('  What is the minimum size of the word? ')
    min_digit = int(min_digit)
    print ('')
    max_digit = input('  What is the maximum size of the word? ')
    max_digit = int(max_digit)
    chars = digits + ascii_lowercase + ascii_uppercase
    lst_name = 'numeric_lowercase_uppercase.txt'
    print ('')
    print ('  Generating numbers, lowercase & uppercase between ' + str(min_digit) + ' and ' + str(max_digit) + ' digits.')
    print ('  Please wait...')
    generate_wordlist(chars, min_digit, max_digit, lst_name)

if choice == 5:
    min_digit = input('  What is the minimum size of the word? ')
    min_digit = int(min_digit)
    print ('')
    max_digit = input('  What is the maximum size of the word? ')
    max_digit = int(max_digit)
    chars = punctuation
    lst_name = 'numeric_lowercase_uppercase_punctuation.txt'
    print ('')
    print ('  Generating numbers, lowercase, uppercase & punctuation between ' + str(min_digit) + ' and ' + str(max_digit) + ' digits.')
    print ('  Please wait...')
    generate_wordlist(chars, min_digit, max_digit, lst_name)

I think the best solution would be to write a class that acts like a file, but provides the chunking capability. Your program just writes to this object as though it were a regular file.

The implementation below won't split strings (if you call f.write("this is a test") the entire message is guaranteed to go in one file) and it only starts a new one when the limit is exceeded, so files will be somewhat larger than the chunk size. This behavior is all in the write() method and could be changed if desired.

class chunkyfile(object):
    def __init__(self, filename, chunksize=1000000, mode="w", encoding=None, 
                 extension="", start=0, digits=6):
        self.filename  = filename
        self.chunksize = chunksize
        self.chunkno   = start
        self.file      = None
        self.mode      = mode
        self.encoding  = encoding
        self.digits    = digits
        self.extension = ("." * bool(extension) * (not extension.startswith(".")) +
                          extension)
        self.softspace = 0       # for use with print

    def _nextfile(self):
        self.file and self.file.close()
        self.file = open(self.filename + str(self.chunkno).rjust(self.digits, "0") + 
                         self.extension, mode=self.mode, encoding=self.encoding)
        self.chunkno += 1

    def write(self, text):
        self.file and self.file.tell() > self.chunksize and self.close()
        self.file or self._nextfile()
        self.file.write(text)

    # convenience method, equivalent to print(... file=f)
    # requires Python 3.x or from __future__ import print in Py2
    def print(*objects, sep=" ", end="\n", flush=False):
        print(*objects, sep=sep, end=end, flush=flush, file=self)

    def writelines(self, lines):
        # do it a line at a time in case we need to split
        for line in lines: self.write(line)

    def flush(self):
        self.file and self.file.flush()

    def close(self):
        self.file = self.file and self.file.close()

    # support "with" statement
    def __enter__(self):
        return self

    def __exit__(self, e, value, tb):
        self.close()

# now use the file
with chunkyfile(r"C:\test", 10, extension="txt", encoding="utf8") as f:
    f.write("FINALLY ROBOTIC BEINGS RULE THE WORLD")
    f.write("The humans are dead")
    f.write("The humans are dead")
    f.write("We used poisonous gasses")
    f.write("And we poisoned their asses")

With a little bit of up front configuration you can use the RotatingFileHandler built into the logging library in stdlib.

import logging
from logging.handlers import RotatingFileHandler

log = logging.getLogger('myprog.stufflogger')
log.propagate = False #ensure that we don't mess with other logging

#configure RotatingFileHandler
handler = RotatingFileHandler('base_file_name.txt', maxBytes=1024*1024*20)
handler.setFormatter(logging.Formatter('%(message)s')
handler.terminator = '' # default is new line

log.addHandler(handler)


# you can now use any of the log methods to add the values

log.info('stuff')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM