I want to read a big log file(6GB) in buffer , I mean read 100 MB then sleep for few second, and also I want to prevent to load file content in the memory, I want to read it like head -nx in bash, also the file is include blocks, each block contain many lines, and between each block there is 3 blank line, for example :
[18/05/2015:00:00:00 +0300]%PARSER_ERROR[elapsedTime]
GET /mobile/ HTTP/1.1
host: www.my-host.com:8082
accept: */*
accept-language: en-gb
connection: keep-alive
accept-encoding: gzip, deflate
user-agent: Mozilla/5.0 (iPhone; CPU iPhone OS 8_2 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Mobile/12D508
x-sub-imsi: 418876678
x-sub-msisdn: 333123654
[18/05/2015:00:00:00 +0300]%PARSER_ERROR[elapsedTime]
GET / HTTP/1.1
content-type: application/x-www-form-urlencoded
user-agent: Dalvik/1.6.0 (Linux; U; Android 4.4.2; AirPhoneS6 Build/KOT49H)
host: www.my-host.net
connection: Keep-Alive
accept-encoding: gzip
x-sub-imsi: 418252632
x-sub-msisdn: 333367627836
HTTP/1.1 302 Found
Location: http://www.my-host.net/welcome/main.html
Set-Cookie: oam.Flash.RENDERMAP.TOKEN=-jdrkoipfe; Path=/
[18/05/2015:00:00:00 +0300]%PARSER_ERROR[elapsedTime]
GET / HTTP/1.1
content-type: application/x-www-form-urlencoded
user-agent: Dalvik/1.6.0 (Linux; U; Android 4.4.2; AirPhoneS6 Build/KOT49H)
host: www.my-host.net
connection: Keep-Alive
accept-encoding: gzip
x-sub-imsi: 41887237832
x-sub-msisdn: 333878778
I want to export user-agent and its msisdn and the platform version to csv file, so I am going to generate 2 file, ios.cs and android.csv, and each file will contain uniq msisdn the file will be like: user-agent, version, msisdn example: Android, 4.2.2, 333878778
So I have to check block by block and then check the user-agent line, and then its msisdn. I tried it to do it in bash, but since bash is not that much flexible, so I decide to do it in python
You can use fileinput library which provides an iterator, so I don't think it would load whole file into memory, unless you make it do that.
import fileinput
import time
file = fileinput.input('my_log_file.txt')
for line in file:
# do your computation
time.sleep(5)
def readFile(inputFile):
file_object = open(inputFile, 'rb')
buff = int(1E6) #100 Megabyte
while True:
block = file_object.read(buff)
if not buff: time.sleep(3)
doSomeThing(block)
block = file_object.read(buff)
file_object.close()
# time python readfile.py
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.