[英]What is using all the RAM in this python script?
I have very simple python script as follows. 我有以下非常简单的python脚本。 In my use it just counts the number of distinct strings of length 2 in a text file of DNA .
在我的使用中,它只计算DNA文本文件中长度为2的不同字符串的数量。
#!/usr/bin/python
#Count the number of distinct kmers in a file
import sys
def kmer_count(dna, k):
total_kmers = len(dna) - k + 1
# assemble dict of kmer counts
kmer2count = {}
for x in range(len(dna)+1-k):
kmer = dna[x:x+k]
kmer2count[kmer] = kmer2count.get(kmer, 0) + 1
return(len(kmer2count))
workfile = "test.fa"
f = open(workfile, 'r')
dna = f.readline()
print "Number of bytes to represent input", sys.getsizeof(dna)
print "Number of items in dict", kmer_count(dna, 2)
This prints 此打印
Number of bytes to represent input 10000037
Number of items in dict 71
And yet when I look at the memory usage using 但是当我使用
/usr/bin/time --format="Size:%MK Cpu:%P Elapsed:%e" ./kmer.py
I get 我懂了
Size:332776K Cpu:100% Elapsed:2.57
What is using all the RAM? 什么在使用所有RAM?
You used range
in your for loop, which constructs a list containing all the numbers. 您在for循环中使用了
range
,该循环构造了一个包含所有数字的列表。 This is bound to be very big. 这势必很大。
In Python 2, loop over xrange
instead: xrange lazily creates the numbers for the for loop as they are needed. 在Python 2中,改为在
xrange
循环:xrange根据需要懒惰地为for循环创建数字。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.