簡體   English   中英

當使用urlparser在日志文件中循環訪問某些行時缺少特定參數時,如何在列表中附加null

[英]How can I append null in list when particular parameter is absent for some lines while iterating through log file using urlparser

我希望通過url來解析我的文件,但是某些url缺少參數,並且當我遍歷日志行時遇到了缺少參數的錯誤。 我需要將空白或空值添加到解析列表中,以便可以將其轉換為數據框

我的數據文件:日志文件

"GET /pixel.gife=heartbeat&creative_id=33548&in_view_time=290"
"GET/pixel.gife=heartbeat&creative_id=33548&in_view_time=23988"
"GET /pixel.gif?e=heartbeat&creative_id=33548&in_view_time=19183"
"GET /pixel.gif?e=ad_load&creative_id=33548"

我希望輸出為:

   E |  Creative ID | IN VIEW TIME

   heartbeat   33548    290

   heartbeat 33548 23988

   ad_load 33548 null

我的代碼:

parselist = []
for eachline in log.readlines():
    ip_regex = re.findall(r'(\d{18})', eachline)
    date = re.findall(r'([0-9]{4}\-[0-9]{2}\-[0-9]{2})',eachline)
    url = eachline
    parsed = urlparse.urlparse(url)
    parselist.append(ip_regex)
    parselist.append(date)
    parselist.append(urlparse.parse_qs(parsed.query)['e'])
    parselist.append(urlparse.parse_qs(parsed.query)['account_id'])
    parselist.append(urlparse.parse_qs(parsed.query)['impression_id'])
    parselist.append(urlparse.parse_qs(parsed.query)['campaign_id'])
    parselist.append(urlparse.parse_qs(parsed.query)['creative_id'])
    parselist.append(urlparse.parse_qs(parsed.query)['in_view_time'])

我收到錯誤,因為第三行缺少in_view_time參數:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-6-405c1bfb329e> in <module>()
     12     parselist.append(urlparse.parse_qs(parsed.query)['campaign_id'])
     13     parselist.append(urlparse.parse_qs(parsed.query)['creative_id'])
---> 14     parselist.append(urlparse.parse_qs(parsed.query)['in_view_time'])

KeyError: 'in_view_time'

您可以使用tryexcept

parselist = []
for eachline in log.readlines():
    ip_regex = re.findall(r'(\d{18})', eachline)
    date = re.findall(r'([0-9]{4}\-[0-9]{2}\-[0-9]{2})',eachline)
    url = eachline
    parsed = urlparse.urlparse(url)
    parselist.append(ip_regex)
    parselist.append(date)
    try:
        parselist.append(urlparse.parse_qs(parsed.query)['e'])
    except:
        parselist.append('Null')
    try:
        parselist.append(urlparse.parse_qs(parsed.query)['account_id'])
    except:
        parselist.append('Null')
    try:
        parselist.append(urlparse.parse_qs(parsed.query)['impression_id'])
    except:
        parselist.append('Null')
    try:
        parselist.append(urlparse.parse_qs(parsed.query)['campaign_id'])
    except:
        parselist.append('Null')
    try:
        parselist.append(urlparse.parse_qs(parsed.query)['creative_id'])
    except:
        parselist.append('Null')
    try:
        parselist.append(urlparse.parse_qs(parsed.query)['in_view_time'])
    except:
        parselist.append('Null')

或者,以更緊湊的方式:

parselist = []
for eachline in log.readlines():
    ip_regex = re.findall(r'(\d{18})', eachline)
    date = re.findall(r'([0-9]{4}\-[0-9]{2}\-[0-9]{2})',eachline)
    url = eachline
    parsed = urlparse.urlparse(url)
    parselist.append(ip_regex)
    parselist.append(date)

    for key in ['e','account_id','impression_id','campaign_id','creative_id','in_view_time']:
        try:
            parselist.append(urlparse.parse_qs(parsed.query)[key])
        except:
            parselist.append('Null')

作為建議,您可以附加None來代替'Null'

  1. 為什么要創建列表(丟失密鑰並僅存儲值)?
  2. 如果您只對這些值感興趣,則可以簡單地編寫以下內容:
for v in urlparse.parse_qs(parsed.query).values():
    parselist.append(v)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM