[英]Python Dictionary eating up huge amount of ram
我建立了一個python字典,它將單詞存儲為關鍵字以及它們出現在其中的文件列表。下面是代碼片段。
if len(sys.argv) < 2:
search_query = input("Enter the search query")
else:
search_query = sys.argv[1]
#path to the directory where files are stored, store the file names in list named directory_name
directory_name = os.listdir("./test_input")
#create a list list_of_files to get the entore path of the files , so that they can be opend later
list_of_files = []
#appending the files to the list_files
for files in directory_name:
list_of_files.append("./test_input"+"/"+files)
#empty dictionary
search_dictionary = {}
#iterate over the files in the list_of files one by one
for files in list_of_files:
#open the file
open_file = open(files,"r")
#store the basename of the file in as file_name
file_name = os.path.basename(files)
for line in open_file:
for word in line.split():
#if word in the file is not in the dictionary, add the word and the file_name in the dictionary
if word not in search_dictionary:
search_dictionary[word] = [file_name]
else:
#if the filename of a particular word is the same then ignore that
if file_name in search_dictionary[word]:
continue
#if the same word is found in the different file then append that filename
search_dictionary[word].append(file_name)
def search(search_dictionary, search_query):
if search_query in search_dictionary:
print 'found '+ search_query
print search_dictionary[search_query]
else:
print 'not found '+ search_query
search(search_dictionary, search_query)
input_word = ""
while input_word != 'quit':
input_word = raw_input('enter a word to search ')
start1 = time.time()
search(search_dictionary,input_word)
end1 = time.time()
print(end1 - start1)
但如果沒有。 目錄中的文件數量大約為500 MB,這將占用RAM和SWAP空間。 如何管理內存使用情況。
如果您有大量文件,則可能是您沒有關閉文件這一事實。 一種更常見的模式是將文件用作上下文管理器 ,如下所示:
with open(files, 'r') as open_file:
file_name=os.path.basename(files)
for line in open_file:
for word in line.split():
if word not in search_dictionary:
search_dictionary[word]=[file_name]
else:
if file_name in search_dictionary[word]:
continue
search_dictionary[word].append(file_name)
使用此語法意味着您不必擔心關閉文件。 如果您不想執行此操作,則在完成各行的迭代之后,仍應調用open_file.close()
。 這是我在您的代碼中看到的唯一一個問題,我可能看到這可能導致如此高的內存使用率(盡管如果您要打開一些巨大的文件而沒有換行符,也可以這樣做)。
這對內存使用沒有幫助,但是可以使用一種數據類型來大大簡化代碼: collections.defaultdict
。 您的代碼可以這樣寫(我還包括os
模塊可以幫助您的幾件事):
from collections import defaultdict
directory_name="./test_input"
list_of_files=[]
for files in os.listdir(directory_name):
list_of_files.append(os.path.join(directory_name, files))
search_dictionary = defaultdict(set)
start=time.time()
for files in list_of_files:
with open(files) as open_file:
file_name=os.path.basename(files)
for line in open_file:
for word in line.split():
search_dictionary[word].add(file_name)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.