I am trying to scrape through some log files to find a line, such as this:
'A-Topeka-Firesale\:\s\*132\*\d{2,5}\*[23]\d{9}\#'
and once that line is matched, to go backward in the file and find a preceding line, like this:
2016-12-30 11:02:12 DEBUG[ispatcher-18269] ab.talk.this.api.Api - http://hostname:19991/trapeze?session_id=176764&manager_event=old&apostrophe=2341231231234&_operation=doc 3da48a90-0f4f-4eb3-a241-94a1f05b891b requesting:
and I need to match "http://hostname:19991/trapeze?"
, "manager_event=old"
and requesting:
for the second line (which is usually between 3-5 lines above the first match, to be a match.
So far I have tried variations of this:
for each, line in enumerate(f):
first_match = re.search(b'A-Topeka-Firesale\:\s\*132\*\d{2,5}\*[23]\d{9}\#', line)
if first_match:
for i in range(each, -1, -1)
if re.match("|".join(['http://hostname:19991/trapeze', 'manager_event=old', 'requesting:']), str(f[i])):
break
and this:
for each, line in enumerate(f):
first_match = re.search(b'A-Topeka-Firesale\:\s\*132\*\d{2,5}\*[23]\d{9}\#', line)
if first_match:
for i in range(each, -1, -1)
if all(re.match(regex_str, str(f[i])) for regex_str in ['http://hostname:19991/trapeze', 'manager_event=old', 'requesting: ']):
break
And the call matches wrong lines (eg lines starting with blank spaces and with an instance of one of the matches (trapeze)). Please what am I doing wrong and how can I do it better?
Sample input:
2016-01-30 00:00:27 DEBUG[-dispatcher-411] ab.talk.this.api.Api - http://hostname:19991/trapeze?manager_id=40178&manager_event=old&apostrophe=2341231231234&_operation=doc dgfgdffb-8123-4f05-ac15-7ac841afad14 requesting:
HEADERS:
this-is-a-header: 200*01231231234
A-Topeka-Firesale: *132*200*01231231234#
Host: hostname:19991
Accept: */*
User-Agent: AHC/2.0
Timeout-Access: <function1>
CONTENT:
2015-03-12 00:00:28 DEBUG[-dispatcher-747] ab.talk.this.api.Api - http://hostname:19991/trapeze?manager_id=84942&manager_event=old&apostrophe=2341231231235&_operation=ogle abcdf8237-393f-4c4b-bc46-e184cbf08d9a requesting:
HEADERS:
this-is-a-header: 100
A-Topeka-Firesale: *132*100#
Host: hostname:19991
Accept: */*
User-Agent: AHC/2.0
Timeout-Access: <function1>
CONTENT:
Very unclear what it is you really want, but after some guessing - could this be what you want?
2016-12-30 11:02:12 DEBUG[ispatcher-18269] ab.talk.this.api.Api - http://hostname:19991/trapeze?session_id=176764&manager_event=old&apostrophe=2341231231234&_operation=doc 3da48a90-0f4f-4eb3-a241-94a1f05b891b requesting:
bla bla bla
bla bla bla
bla bla bla
A-Topeka-Firesale: *132*12345*2123456789#
In the text above you want to match the last line. (You've only given a regex, so I made one up matching the criteria.) Finding that line will lead you to the first line, matching http://hostname:19991/trapeze?
, manager_event=old
and requesting:
in that order, but not directly after one another.
If I'm guessing correctly, this regex
(http://hostname:19991/trapeze.*?manager_event=old.*?requesting:).*?A-Topeka-Firesale\:\s\*132\*\d{2,5}\*[23]\d{9}\#
should (could) do it for you. It captures the first (complete) line, which is what I understand is what you're after. (You haven't specified if it's something spcific you're after in it, like session_id
, or whatever, but that could be "targeted" directly of course.)
Check it out here at regex101 .
Note that the e x ample uses the extended flag to allow splitting the regex up to (somewhat) improve readability, and the s ingle line flag to have .
match line feeds.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.