簡體   English   中英

將log.txt文件轉換為JSON文件

[英]Convert a log.txt file to JSON file

我必須將日志文件轉換為json文件,以訓練非監督模型。 日志文件的格式為-

40.77.167.191, 172.16.30.15 - - [08/May/2018:03:29:15 +0530] "GET /speedwav-full-chrome-side-beading-for-tata-indigo-cs-46901.html HTTP/1.1" 403 162 <0.000> <-> "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
66.249.79.25, 172.16.30.15 - - [08/May/2018:03:29:17 +0530] "GET /schneider-dc-control-relays-ca4kn31-t008000721.html HTTP/1.1" 200 14443 <0.445> <0.445> "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.79.25, 172.16.30.15 - - [08/May/2018:03:29:19 +0530] "GET /ajax/pdp/recentlyviewed/1184932 HTTP/1.1" 200 2 <0.089> <0.089> "https://www.tolexo.com/orient-18w-eternal-surface-panel-square-led-light-18w01-t14ori0043.html" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

我想以以下格式獲取文件-

40.77.167.191, 172.16.30.15 - - [08/May/2018:03:29:15 +0530] "GET /speedwav-full-chrome-side-beading-for-tata-indigo-cs-46901.html HTTP/1.1" 403 162 <0.000> <-> "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

66.249.79.25, 172.16.30.15 - - [08/May/2018:03:29:17 +0530] "GET /schneider-dc-control-relays-ca4kn31-t008000721.html HTTP/1.1" 200 14443 <0.445> <0.445> "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

66.249.79.25, 172.16.30.15 - - [08/May/2018:03:29:19 +0530] "GET /ajax/pdp/recentlyviewed/1184932 HTTP/1.1" 200 2 <0.089> <0.089> "https://www.tolexo.com/orient-18w-eternal-surface-panel-square-led-light-18w01-t14ori0043.html" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

然后為其創建一個json文件。

使用re.split

例如:

import re

s = """40.77.167.191, 172.16.30.15 - - [08/May/2018:03:29:15 +0530] "GET /speedwav-full-chrome-side-beading-for-tata-indigo-cs-46901.html HTTP/1.1" 403 162 <0.000> <-> "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" 66.249.79.25, 172.16.30.15 - - [08/May/2018:03:29:17 +0530] "GET /schneider-dc-control-relays-ca4kn31-t008000721.html HTTP/1.1" 200 14443 <0.445> <0.445> "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.79.25, 172.16.30.15 - - [08/May/2018:03:29:19 +0530] "GET /ajax/pdp/recentlyviewed/1184932 HTTP/1.1" 200 2 <0.089> <0.089> "https://www.tolexo.com/orient-18w-eternal-surface-panel-square-led-light-18w01-t14ori0043.html" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"""
val = re.split(r"(\d+\.\d+\.\d+\.\d+, \d+\.\d+\.\d+\.\d+)", s)[1:]
for v, w in zip(val[::2], val[1::2]):
    print(v, w)

輸出:

('40.77.167.191, 172.16.30.15', ' - - [08/May/2018:03:29:15 +0530] "GET /speedwav-full-chrome-side-beading-for-tata-indigo-cs-46901.html HTTP/1.1" 403 162 <0.000> <-> "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" ')
('66.249.79.25, 172.16.30.15', ' - - [08/May/2018:03:29:17 +0530] "GET /schneider-dc-control-relays-ca4kn31-t008000721.html HTTP/1.1" 200 14443 <0.445> <0.445> "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" ')
('66.249.79.25, 172.16.30.15', ' - - [08/May/2018:03:29:19 +0530] "GET /ajax/pdp/recentlyviewed/1184932 HTTP/1.1" 200 2 <0.089> <0.089> "https://www.tolexo.com/orient-18w-eternal-surface-panel-square-led-light-18w01-t14ori0043.html" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)')

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM