I have a log file and need to create a hash key for each URL in the record. Each line from the record has been placed into an array and I am looping through the array assigning hash keys.
I need to get from this:
"2010/08/23 15:25:35 [error]: (4: No such file or directory), clent: 80.154.42.54, server: localhost, request: "GET /logschecks/scripts/setup1.php HTTP/1.1", host: "www.example.com"
to this:
"/logschecks/scripts/setup1.php"
I have tried using match
, scan
and split
but they have both failed to get me where I need to go.
My method currently looks like:
def pathHistogram (rowsInFile)
i = 0
urlHash = Hash.new
while i <= rowsInFile.length - 1
urlKey = rowsInFile[i].scan(/<"GET ">/).last.first
if urlHash.has_key?(urlKey) == true
#get the number of stars already in there and add one.
urlHash[urlKey] = urlHash[urlKey] + '*'
i = i + 1
else
urlHash[urlKey] = '*'
i = i + 1
end
end
end
I know that just scanning the "GET " won't complete the job but I was trying to baby-step through it. The match
and split
versions that I tried were fairly epic-fails, but I was likely using them incorrectly and they are long gone.
Running this script gives me an undefined method error on "first", though I have gotten other errors when I vary the way this is handled.
I should also say I am not married to using scan
. If another method would work better, I would be more than happy to switch.
Any help would be greatly appreciated.
You state in a comment to the other answer the pattern is basically "GET ... HTTP
, where you are interested in the ...
part. That can be extracted very easily:
line = '2010/08/23 15:25:35 [error]: (4: No such file or directory), clent: 80.154.42.54, server: localhost, request: "GET /logschecks/scripts/setup1.php HTTP/1.1", host: "www.example.com"'
line[/"GET (.*?) HTTP/, 1]
# => "/logschecks/scripts/setup1.php"
Assuming each of your input lines contains /logschecks/...
:
x = "2010/08/23 15:25:35 [error]: (4: No such file or directory), clent: 80.154.42.54, server: localhost, request: \"GET /logschecks/scripts/setup1.php HTTP/1.1\", host: \"www.example.com\""
x[%r(/logscheck[/\w\.]+)] # => "/logschecks/scripts/setup1.php"
Scanning HTTP logs isn't hard, but how you go about it will vary depending on the format. In the sample you're giving it's easier than a standard log because you have some landmarks you can look for:
Search for request: "
using something like:
/request: "\\S+ (\\S+)/i
That pattern will skip over GET
, POST
, HEAD
or whatever method was used for the request.
log_line[/request: "\\S+ (\\S+)/i, 1] # => "/logschecks/scripts/setup1.php"
You might want to know that if you're mining your logs. In that case...
Search for request: "[GET|POST|HEAD|...]
using something like:
/request: "(\\S+) (\\S+)/i
You'd use it like:
method, url = log_line.match(/request: "(\\S+) (\\S+)/i).captures # => ["GET", "/logschecks/scripts/setup1.php"] method # => "GET" url # => "/logschecks/scripts/setup1.php"
You can also grab whatever is inside the double-quotes , then split it to get at the parts:
/request: "([^"]+)"/i
For instance:
log_line = %[2010/08/23 15:25:35 [error]: (4: No such file or directory), clent: 80.154.42.54, server: localhost, request: "GET /logschecks/scripts/setup1.php HTTP/1.1", host: "www.example.com"] method, url, http_ver = log_line[/request: "([^"]+)"/i, 1].split # => ["GET", "/logschecks/scripts/setup1.php", "HTTP/1.1"] method # => "GET" url # => "/logschecks/scripts/setup1.php" http_ver # => "HTTP/1.1"
Or use a bit more complex pattern , using some of the modern extensions and reduce the code:
log_line = %[2010/08/23 15:25:35 [error]: (4: No such file or directory), clent: 80.154.42.54, server: localhost, request: "GET /logschecks/scripts/setup1.php HTTP/1.1", host: "www.example.com"] /request: "(?<method>\\S+) (?<url>\\S+) (?<http_ver>\\S+)"/i =~ log_line method # => "GET" url # => "/logschecks/scripts/setup1.php" http_ver # => "HTTP/1.1"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.