![](/img/trans.png)
[英]When iterating through lines in txt file, how can I capture multiple subsequent lines after a regex triggers?
[英]How can I append null in list when particular parameter is absent for some lines while iterating through log file using urlparser
我希望通過url來解析我的文件,但是某些url缺少參數,並且當我遍歷日志行時遇到了缺少參數的錯誤。 我需要將空白或空值添加到解析列表中,以便可以將其轉換為數據框
我的數據文件:日志文件
"GET /pixel.gife=heartbeat&creative_id=33548&in_view_time=290"
"GET/pixel.gife=heartbeat&creative_id=33548&in_view_time=23988"
"GET /pixel.gif?e=heartbeat&creative_id=33548&in_view_time=19183"
"GET /pixel.gif?e=ad_load&creative_id=33548"
我希望輸出為:
E | Creative ID | IN VIEW TIME
heartbeat 33548 290
heartbeat 33548 23988
ad_load 33548 null
我的代碼:
parselist = []
for eachline in log.readlines():
ip_regex = re.findall(r'(\d{18})', eachline)
date = re.findall(r'([0-9]{4}\-[0-9]{2}\-[0-9]{2})',eachline)
url = eachline
parsed = urlparse.urlparse(url)
parselist.append(ip_regex)
parselist.append(date)
parselist.append(urlparse.parse_qs(parsed.query)['e'])
parselist.append(urlparse.parse_qs(parsed.query)['account_id'])
parselist.append(urlparse.parse_qs(parsed.query)['impression_id'])
parselist.append(urlparse.parse_qs(parsed.query)['campaign_id'])
parselist.append(urlparse.parse_qs(parsed.query)['creative_id'])
parselist.append(urlparse.parse_qs(parsed.query)['in_view_time'])
我收到錯誤,因為第三行缺少in_view_time參數:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-6-405c1bfb329e> in <module>()
12 parselist.append(urlparse.parse_qs(parsed.query)['campaign_id'])
13 parselist.append(urlparse.parse_qs(parsed.query)['creative_id'])
---> 14 parselist.append(urlparse.parse_qs(parsed.query)['in_view_time'])
KeyError: 'in_view_time'
您可以使用try
和except
:
parselist = []
for eachline in log.readlines():
ip_regex = re.findall(r'(\d{18})', eachline)
date = re.findall(r'([0-9]{4}\-[0-9]{2}\-[0-9]{2})',eachline)
url = eachline
parsed = urlparse.urlparse(url)
parselist.append(ip_regex)
parselist.append(date)
try:
parselist.append(urlparse.parse_qs(parsed.query)['e'])
except:
parselist.append('Null')
try:
parselist.append(urlparse.parse_qs(parsed.query)['account_id'])
except:
parselist.append('Null')
try:
parselist.append(urlparse.parse_qs(parsed.query)['impression_id'])
except:
parselist.append('Null')
try:
parselist.append(urlparse.parse_qs(parsed.query)['campaign_id'])
except:
parselist.append('Null')
try:
parselist.append(urlparse.parse_qs(parsed.query)['creative_id'])
except:
parselist.append('Null')
try:
parselist.append(urlparse.parse_qs(parsed.query)['in_view_time'])
except:
parselist.append('Null')
或者,以更緊湊的方式:
parselist = []
for eachline in log.readlines():
ip_regex = re.findall(r'(\d{18})', eachline)
date = re.findall(r'([0-9]{4}\-[0-9]{2}\-[0-9]{2})',eachline)
url = eachline
parsed = urlparse.urlparse(url)
parselist.append(ip_regex)
parselist.append(date)
for key in ['e','account_id','impression_id','campaign_id','creative_id','in_view_time']:
try:
parselist.append(urlparse.parse_qs(parsed.query)[key])
except:
parselist.append('Null')
作為建議,您可以附加None
來代替'Null'
。
for v in urlparse.parse_qs(parsed.query).values():
parselist.append(v)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.