简体   繁体   中英

Python read from file every time or store values in list?

I am currently writing a python script to generate every prime by brute force. I currently have a >5Mb file containing prime numbers and as the script runs it appends any new prime it finds so the file will keep getting bigger. Every time the script is run this file is read into a list which then gets looped over to calculate if the next number is a prime or not any new prime also gets appended to this list.

My question is, is it better to load this file into memory every time the script is run, or should I read the next line of the file in a for loop, process that against the number being checked, then load the next line?

The former creates a large list being held in memory but is very fast, the second would be slower because it has to read the file every time the loop iterates but I don't think it would use near the memory.

here is my code it takes a configuration file as an argument containing the number to start looking for primes at and the file to read/write primes to:

import sys, math, time

def is_prime(num,primes):
    square = math.floor(math.sqrt(num))
    print('using all prime numbers up to %d' % square)
    for p in primes:
        if p <= square:
            print (p, end='\r')
            if (num % p) == 0:
                return False
        else:
            return True
    return True

def main(argv):
    if len(sys.argv) == 2:
        try:
            try:
                f = open(sys.argv[1], 'r+')
            except IOError:
                sys.exit('Error: File %s does not exist in the current directory...\nUsage: generate_primes.py <prime_file>' % sys.argv[1])
            f.close()

            f = open(sys.argv[1], 'r+')
            low = f.readlines()
            f.close()

            num_to_check = int(low[0].strip('\n'))
            file_name = low[1].strip('\n')
            print(num_to_check)
            print(file_name)

            if num_to_check % 2 == 0:
                num_to_check += 1

            f = open(file_name, 'a+')
            f.seek(0)
            primes = f.readlines()

            print('Processing Primes...')
            for key,i in enumerate(primes):
                primes[key] = int(primes[key].strip('\n'))

            if primes[-1] > num_to_check:
                num_to_check = primes[-1]
                print('Last saved prime is bigger than config value.\nDefaulting to largest saved prime... %d' % primes[-1])

            time.sleep(2)

            new_primes = 0

            while True:
                print('Checking: %s ' % str(num_to_check), end='')
                if is_prime(num_to_check,primes):
                    print('Prime')
                    f.write('%s\n' % str(num_to_check))
                    primes.append(num_to_check)
                    new_primes += 1
                else:
                    print('Composite')
                num_to_check += 2

        except KeyboardInterrupt:
            config_name = time.strftime('%Y%m%d-%H%M%S')
            print('Keyboard Interrupt: \n creating config file %s ... ' % config_name)
            c = open(config_name,'w')
            c.write('%d\n%s' % (num_to_check,file_name))
            c.close()
            f.close()
            print('Done\nPrimes Found: %d\nExiting...' % new_primes)
            sys.exit()


if __name__ == '__main__':
    main(sys.argv[1:])

Note: the primes file cannot contain a solitary 1 otherwise every number will come up composite.

The one concern I have about only reading from the file is being able to get the value of the largest prime stored (aka. reading the last line in the file).

Optimizations for speed and memory can often be at odds. Some programs will use massive amounts of memory, but be blazing fast (Chrome strives for this), others may target the reverse, and many attempt to seek a balance between the two. The choice of what to focus on should revolve mostly around the problem, the use case, and, if you are real thorough, the data.

If the script is to be run over and over again, where latencies and slow speeds would really add up fast... you may want to focus on optimizing for speed. If the script takes more than a second or so to run and a user has to stare at a screen uselessly until it is complete to proceed... you may want to focus on speed. If your action is time sensitive, perhaps things need to be happening real time, and you don't want to get behind from excess latency... you might want to focus on speed.

If the script is to be run only occasionally, and in a predominantly time-insensitive environment, preferably in the background somewhere, and especially on limited or lower end hardware... you might want to focus on memory.

Getting more specific to your problem, I can say I agree entirely with Kristjan's comment, 5MB is not that much. Looking at task manager on my laptop right now, I can say that I have two tabs open from wikipedia, haven't touched them in a long time, and they are using 33x that, one tab on Facebook, similar story but 280x that, rubyMine (IDE) is using 244x that, activity monitor(Task Manager) itself is using 33x that, and not much is happening under the 20MB mark besides small system stuff that should really be grouped together for less clutter, and some programs I thought I closed a week ago. If the rest of your application is maintaining a relatively low memory footprint, you aren't targeting weak or embedded hardware, it is likely people would sooner complain about slow speeds than a ~5MB footprint in RAM, especially if you clean it up when your done (more applicable for lower level languages, but perhaps del could help here).

Really though, only you know the constraints of the problem you are working with. Well, that might not be true, but I certainly don't know them. You are going to have to make the call about what is important to you in your implementation, and that likely involves a compromise somewhere. Benchmarking both implementations to quantify the speed increase may help you justify one decision over the other, and in a tie, ease of implementation can certainly be considered as well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM