簡體   English   中英

為什么我不能拆分此python列表?

[英]Why can't I split this python list?

我有一些代碼來解析apache日志文件( start_searchend_search是在apache日志中找到的格式的日期字符串):

with open("/var/log/apache2/access.log",'r') as log:
    from itertools import takewhile, dropwhile
    s_log = dropwhile(lambda L: start_search not in L, log)
    e_log = takewhile(lambda L: end_search not in L, s_log)
    query = [line for line in e_log if re.search(r'GET /(.+veggies|.+fruits)',line)]

    import csv
    query_dict = csv.DictReader(query,fieldnames=('ip','na-1','na-2','time', 'zone', 'url', 'refer', 'client'),quotechar='"',delimiter=" ")

    import re
    veggies = [ x for x in query_dict if re.search('veggies',x['url']) ]
    fruits = [ x for x in query_dict if re.search('fruits',x['url']) ]

第二個列表生成器始終為空; 也就是說,如果我切換最后兩行的順序:

    fruits = [ x for x in query_dict if re.search('fruits',x['url']) ]
    veggies = [ x for x in query_dict if re.search('veggies',x['url']) ]

第二個列表始終為空。

為什么? (以及如何填充fruitsveggies列表?)

您只能在迭代器上循環一次 query_dict是一個迭代器,一旦掃描了veggies就無法再次迭代來尋找fruits

不要在這里使用列表推導。 循環遍歷query_dict 一次 ,檢查每個條目是否有veggiesfruits

veggies = []
fruits = []

for x in query_dict:
    if re.search('veggies',x['url']):
         veggies.append(x)
    if re.search('fruits',x['url']):
         fruits.append(x)

替代方案是:

  • fruits列表重新創建csv.DictReader()對象:

     query_dict = csv.DictReader(query,fieldnames=('ip','na-1','na-2','time', 'zone', 'url', 'refer', 'client'),quotechar='"',delimiter=" ") veggies = [ x for x in query_dict if re.search('veggies',x['url']) ] query_dict = csv.DictReader(query,fieldnames=('ip','na-1','na-2','time', 'zone', 'url', 'refer', 'client'),quotechar='"',delimiter=" ") fruits = [ x for x in query_dict if re.search('fruits',x['url']) ] 

    這確實有雙重作用。 您遍歷整個數據集兩次。

  • 使用itertools.tee()來“克隆”迭代器:

     from itertools import tee veggies_query_dict, fruits_query_dict = tee(query_dict) veggies = [ x for x in veggies_query_dict if re.search('veggies',x['url']) ] fruits = [ x for x in fruits_query_dict if re.search('fruits',x['url']) ] 

    這最終將所有query_dict緩存在tee緩沖區中,為同一任務需要兩倍的內存,直到fruits再次清空緩沖區。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM