I have output of a function called ABC like below as a string:
19/09/09 10:34:37 INFO tool.ImportTool: --incremental append
19/09/09 10:34:37 INFO tool.ImportTool: --check-column DTIN
19/09/09 10:34:37 INFO tool.ImportTool: --last-value 2019-07-27 00:00:00.0
19/09/09 10:34:37 INFO tool.ImportTool: (Consider saving this with 'sqoop job --create')
How can i get --last-value
in python , 2019-07-27 00:00:00.0
will be dynamic .
Note: I have around 100 lines of data as output but here i given only last 4 rows.
expected --last-value = 2019-07-27 00:00:00.0
here date is dynamic based on output.
You can use string slice or regular expression to get this date from input.
String slice:
text = """19/09/09 10:34:37 INFO tool.ImportTool: --incremental append 19/09/09 10:34:37 INFO tool.ImportTool: --check-column DTIN 19/09/09 10:34:37 INFO tool.ImportTool: --last-value 2019-07-27 00:00:00.0 19/09/09 10:34:37 INFO tool.ImportTool: (Consider saving this with 'sqoop job --create')""" keyword = "--last-value" idx = text.index(keyword) + len(keyword) + 1 # keyword index + length of keyword + 1 (space) last_value = text[idx: text.index("\\n", idx)]
import re last_value = re.search(r"--last-value (.+)\\n", text).group(1)
Regex are your best friend!
If there are many occurrences of '--last-value' in your file, you must use re.findall()
instead re.search()
to get all values, as in the following code:
import re
text = """19/09/09 10:34:37 INFO tool.ImportTool: --incremental append
19/09/09 10:34:37 INFO tool.ImportTool: --check-column DTIN
19/09/09 10:34:37 INFO tool.ImportTool: --last-value 2019-07-27 00:00:01.0
19/09/09 10:34:37 INFO tool.ImportTool: --last-value 2029-07-27 00:00:02.0
19/09/09 10:34:37 INFO tool.ImportTool: (Consider saving this with 'sqoop job --create')"""
sep = '--last-value '
regex = "%s(.+)\n" % sep
string_dates = re.findall(regex, text)
print(string_dates) # ['2019-07-27 00:00:01.0', '2029-07-27 00:00:02.0']
It can be useful to convert strings in the string_dates
list to datetime
according to your format:
import re
from datetime import datetime as dt
date_format = '%Y-%m-%d %H:%M:%S.%f'
datetime_values = [dt.strptime(res, date_format) for res in string_dates]
I have written a primitive RegEx . You can use it to get the related lines from your log.
Code:
import re
data = """19/09/09 10:34:37 INFO tool.ImportTool: --incremental append
19/09/09 10:34:37 INFO tool.ImportTool: --check-column DTIN
19/09/09 10:34:37 INFO tool.ImportTool: --last-value 2019-07-27 00:00:00.0
19/09/09 10:34:37 INFO tool.ImportTool: (Consider saving this with 'sqoop job --create')
19/09/09 10:34:37 INFO tool.ImportTool: --last-value 2019-08-08 04:02:99.2
"""
last_values = re.findall(r"--last-value [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]", data)
print(last_values)
Output:
>>> python3 test.py
['--last-value 2019-07-27 00:00:00.0', '--last-value 2019-08-08 04:02:99.2']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.