简体   繁体   中英

What is using all the RAM in this python script?

I have very simple python script as follows. In my use it just counts the number of distinct strings of length 2 in a text file of DNA .

#!/usr/bin/python
#Count the number of distinct kmers in a file
import sys
def kmer_count(dna, k):
    total_kmers = len(dna) - k + 1
    # assemble dict of kmer counts
    kmer2count = {}
    for x in range(len(dna)+1-k):
        kmer = dna[x:x+k]
        kmer2count[kmer] = kmer2count.get(kmer, 0) + 1
    return(len(kmer2count))


workfile = "test.fa"
f = open(workfile, 'r')
dna = f.readline()
print "Number of bytes to represent input", sys.getsizeof(dna)
print "Number of items in dict", kmer_count(dna, 2)

This prints

Number of bytes to represent input 10000037
Number of items in dict 71

And yet when I look at the memory usage using

/usr/bin/time --format="Size:%MK  Cpu:%P  Elapsed:%e" ./kmer.py

I get

Size:332776K  Cpu:100%  Elapsed:2.57

What is using all the RAM?

You used range in your for loop, which constructs a list containing all the numbers. This is bound to be very big.

In Python 2, loop over xrange instead: xrange lazily creates the numbers for the for loop as they are needed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM