[英]Python filter larger text by quantile
假設我正在處理一個非常大的文本文件,我有以下偽代碼
xx_valueList = []
lines=[]
with line in file:
xx_value = calc_xxValue(line)
xx_valueList.append(xx_value)
lines.append(lines)
# get_quantile_value is a function return the cutoff value with a specific quantile precent
cut_offvalue = get_quantile_value(xx_valueList, precent=0.05)
for line in lines:
if calc_xxValue(line) > cut_offvalue:
# do someting here
注意文件很大,可能來自一個pipe,不想看兩遍。
我們必須先讀取整個文件才能獲得過濾文件的截斷值
上面的方法可以,但是memory的消耗太大了,有沒有什么算法優化可以提高效率,減少memory的消耗?
xx_value_list = []
cut_offvalue = 0
with open(file, 'r') as f:
for line in f:
xx_value = calc_xxValue(line)
xx_value_list.append(xx_value)
if len(xx_value_list) % 100 == 0:
cut_offvalue = get_quantile_value(xx_value_list, precent=0.05)
if xx_value < cut_offvalue:
# do something here
pass
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.