I have a log with entries in the following format:
1483528632 3 1 Wed Jan 4 11:17:12 2017 501040002 4
1533528768 4 2 Thu Jan 5 19:17:45 2017 534040012 3
...
How do I fetch only the timestamp component (eg. Wed Jan 4 11:17:12 2017
) using regular expressions?
I have to implement the final product in python, but the requirement is to have part of an automated regression suite in bash/perl (with the final product eventually being in Python).
If the format is fixed in terms of space delimiters, you can simply split , get a slice of a date string and load it to datetime
object via datetime.strptime()
:
In [1]: from datetime import datetime
In [2]: s = "1483528632 3 1 Wed Jan 4 11:17:12 2017 501040002 4"
In [3]: date_string = ' '.join(s.split()[3:8])
In [4]: datetime.strptime(date_string, "%a %b %d %H:%M:%S %Y")
Out[4]: datetime.datetime(2017, 1, 4, 11, 17, 12)
The regex to match the timestamp is:
'[a-zA-Z]{3} +[a-zA-Z]{3} +\\d{1,2} +\\d{2}:\\d{2}:\\d{2} +\\d{4}'
.
With grep that can be used like this (if your log file was called log.txt
):
$ grep -oE '[a-zA-Z]{3} +[a-zA-Z]{3} +\d{1,2} +\d{2}:\d{2}:\d{2} +\d{4}' log.txt
# Wed Jan 4 11:17:12 2017
# Thu Jan 5 19:17:45 2017
In python you can use that like so:
import re
log_entry = "1483528632 3 1 Wed Jan 4 11:17:12 2017 501040002 4"
pattern = '[a-zA-Z]{3} +[a-zA-Z]{3} +\d{1,2} +\d{2}:\d{2}:\d{2} +\d{4}'
compiled = re.compile(pattern)
match = compiled.search(log_entry)
match.group(0)
# 'Wed Jan 4 11:17:12 2017'
You can use this to get an actual datetime object from the string (expanding on above code):
from datetime import datetime
import re
log_entry = "1483528632 3 1 Wed Jan 4 11:17:12 2017 501040002 4"
pattern = '[a-zA-Z]{3} +[a-zA-Z]{3} +\d{1,2} +\d{2}:\d{2}:\d{2} +\d{4}'
compiled = re.compile(pattern)
match = compiled.search(log_entry)
log_time_str = match.group(0)
datetime.strptime(log_time_str, "%a %b %d %H:%M:%S %Y")
# datetime.datetime(2017, 1, 4, 11, 17, 12)
Grep is most often used in this scenario if you are working with syslog. But as the post is also tagged with Python. This example uses regular expressions with re :
import re
Define the pattern to match:
pat = "\w{3}\s\w{3}\s+\w\s\w{2}:\w{2}:\w{2}\s\w{4}"
Then use re.findall to return all non-overlapping matches of pattern in txt:
re.findall(pat,txt)
Output:
['Wed Jan 4 11:17:12 2017', 'Thu Jan 5 19:17:45 2017']
If you want to then use datetime :
import datetime
dates = re.findall(pat,txt)
datetime.datetime.strptime(dates[0], "%a %b %d %H:%M:%S %Y")
Output:
datetime.datetime(2017, 1, 4, 11, 17, 12)
You can then utilise these datetime objects:
dateObject = datetime.datetime.strptime(dates[0], "%a %b %d %H:%M:%S %Y").date()
timeObject = datetime.datetime.strptime(dates[0], "%a %b %d %H:%M:%S %Y").time()
print('The date is {} and time is {}'.format(dateObject,timeObject))
Output:
The date is 2017-01-04 and time is 11:17:12
Two approaches: with and without using regular expressions
1) using re.findall()
function:
with open('test.log', 'r') as fh:
lines = re.findall(r'\b[A-Za-z]{3}\s[A-Za-z]{3}\s{2}\d{1,2} \d{2}:\d{2}:\d{2} \d{4}\b',fh.read(), re.M)
print(lines)
2) usign str.split()
and str.join()
functions:
with open('test.log', 'r') as fh:
lines = [' '.join(d.split()[3:8]) for d in fh.readlines()]
print(lines)
The output in both cases will be a below:
['Wed Jan 4 11:17:12 2017', 'Thu Jan 5 19:17:45 2017']
grep -E '\b(Mon|Tue|Wed|Thu|Fri|Sat|Sun) (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) +[0-9]+ [0-9]{2}:[0-9]{2}:[0-9]{2} [0-9]{4}\b' dates
如果您只想列出日期,而不是 grep,也许:
sed -nre 's/^.*([A-Za-z]{3}\s+[A-Za-z]{3}\s+[0-9]+\s+[0-9]+:[0-9]+:[0-9]+\s+[0-9]{4}).*$/\1/p' filename
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.