![](/img/trans.png)
[英]Why can't I use the split() method in Python to split a string into a list from within a function?
[英]Why can't I split this python list?
我有一些代碼來解析apache日志文件( start_search
和end_search
是在apache日志中找到的格式的日期字符串):
with open("/var/log/apache2/access.log",'r') as log:
from itertools import takewhile, dropwhile
s_log = dropwhile(lambda L: start_search not in L, log)
e_log = takewhile(lambda L: end_search not in L, s_log)
query = [line for line in e_log if re.search(r'GET /(.+veggies|.+fruits)',line)]
import csv
query_dict = csv.DictReader(query,fieldnames=('ip','na-1','na-2','time', 'zone', 'url', 'refer', 'client'),quotechar='"',delimiter=" ")
import re
veggies = [ x for x in query_dict if re.search('veggies',x['url']) ]
fruits = [ x for x in query_dict if re.search('fruits',x['url']) ]
第二個列表生成器始終為空; 也就是說,如果我切換最后兩行的順序:
fruits = [ x for x in query_dict if re.search('fruits',x['url']) ]
veggies = [ x for x in query_dict if re.search('veggies',x['url']) ]
第二個列表始終為空。
為什么? (以及如何填充fruits
和veggies
列表?)
您只能在迭代器上循環一次 ; query_dict
是一個迭代器,一旦掃描了veggies
就無法再次迭代來尋找fruits
。
不要在這里使用列表推導。 循環遍歷query_dict
一次 ,檢查每個條目是否有veggies
和fruits
:
veggies = []
fruits = []
for x in query_dict:
if re.search('veggies',x['url']):
veggies.append(x)
if re.search('fruits',x['url']):
fruits.append(x)
替代方案是:
為fruits
列表重新創建csv.DictReader()
對象:
query_dict = csv.DictReader(query,fieldnames=('ip','na-1','na-2','time', 'zone', 'url', 'refer', 'client'),quotechar='"',delimiter=" ") veggies = [ x for x in query_dict if re.search('veggies',x['url']) ] query_dict = csv.DictReader(query,fieldnames=('ip','na-1','na-2','time', 'zone', 'url', 'refer', 'client'),quotechar='"',delimiter=" ") fruits = [ x for x in query_dict if re.search('fruits',x['url']) ]
這確實有雙重作用。 您遍歷整個數據集兩次。
使用itertools.tee()
來“克隆”迭代器:
from itertools import tee veggies_query_dict, fruits_query_dict = tee(query_dict) veggies = [ x for x in veggies_query_dict if re.search('veggies',x['url']) ] fruits = [ x for x in fruits_query_dict if re.search('fruits',x['url']) ]
這最終將所有query_dict
緩存在tee
緩沖區中,為同一任務需要兩倍的內存,直到fruits
再次清空緩沖區。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.