简体   繁体   English

将log.txt文件转换为JSON文件

[英]Convert a log.txt file to JSON file

I have to convert a log file into a json file to train a unsupervised model. 我必须将日志文件转换为json文件,以训练非监督模型。 The log file is in format - 日志文件的格式为-

40.77.167.191, 172.16.30.15 - - [08/May/2018:03:29:15 +0530] "GET /speedwav-full-chrome-side-beading-for-tata-indigo-cs-46901.html HTTP/1.1" 403 162 <0.000> <-> "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
66.249.79.25, 172.16.30.15 - - [08/May/2018:03:29:17 +0530] "GET /schneider-dc-control-relays-ca4kn31-t008000721.html HTTP/1.1" 200 14443 <0.445> <0.445> "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.79.25, 172.16.30.15 - - [08/May/2018:03:29:19 +0530] "GET /ajax/pdp/recentlyviewed/1184932 HTTP/1.1" 200 2 <0.089> <0.089> "https://www.tolexo.com/orient-18w-eternal-surface-panel-square-led-light-18w01-t14ori0043.html" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

I want to get the file in format - 我想以以下格式获取文件-

40.77.167.191, 172.16.30.15 - - [08/May/2018:03:29:15 +0530] "GET /speedwav-full-chrome-side-beading-for-tata-indigo-cs-46901.html HTTP/1.1" 403 162 <0.000> <-> "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

66.249.79.25, 172.16.30.15 - - [08/May/2018:03:29:17 +0530] "GET /schneider-dc-control-relays-ca4kn31-t008000721.html HTTP/1.1" 200 14443 <0.445> <0.445> "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

66.249.79.25, 172.16.30.15 - - [08/May/2018:03:29:19 +0530] "GET /ajax/pdp/recentlyviewed/1184932 HTTP/1.1" 200 2 <0.089> <0.089> "https://www.tolexo.com/orient-18w-eternal-surface-panel-square-led-light-18w01-t14ori0043.html" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

and then create a json file for it. 然后为其创建一个json文件。

Using re.split 使用re.split

Ex: 例如:

import re

s = """40.77.167.191, 172.16.30.15 - - [08/May/2018:03:29:15 +0530] "GET /speedwav-full-chrome-side-beading-for-tata-indigo-cs-46901.html HTTP/1.1" 403 162 <0.000> <-> "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" 66.249.79.25, 172.16.30.15 - - [08/May/2018:03:29:17 +0530] "GET /schneider-dc-control-relays-ca4kn31-t008000721.html HTTP/1.1" 200 14443 <0.445> <0.445> "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.79.25, 172.16.30.15 - - [08/May/2018:03:29:19 +0530] "GET /ajax/pdp/recentlyviewed/1184932 HTTP/1.1" 200 2 <0.089> <0.089> "https://www.tolexo.com/orient-18w-eternal-surface-panel-square-led-light-18w01-t14ori0043.html" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"""
val = re.split(r"(\d+\.\d+\.\d+\.\d+, \d+\.\d+\.\d+\.\d+)", s)[1:]
for v, w in zip(val[::2], val[1::2]):
    print(v, w)

Output: 输出:

('40.77.167.191, 172.16.30.15', ' - - [08/May/2018:03:29:15 +0530] "GET /speedwav-full-chrome-side-beading-for-tata-indigo-cs-46901.html HTTP/1.1" 403 162 <0.000> <-> "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" ')
('66.249.79.25, 172.16.30.15', ' - - [08/May/2018:03:29:17 +0530] "GET /schneider-dc-control-relays-ca4kn31-t008000721.html HTTP/1.1" 200 14443 <0.445> <0.445> "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" ')
('66.249.79.25, 172.16.30.15', ' - - [08/May/2018:03:29:19 +0530] "GET /ajax/pdp/recentlyviewed/1184932 HTTP/1.1" 200 2 <0.089> <0.089> "https://www.tolexo.com/orient-18w-eternal-surface-panel-square-led-light-18w01-t14ori0043.html" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM