[英]how to extract the specific word from a string?
我有一個包含多行的文件,想要提取每行的前三個單詞。
str = []
str = [
Feb 17 07:10:07 afg-prod-web2 journal: afg-prod-web2 statistics: 192.168.28.12 - 200 - "{\x0A \x22identifier\x22: {\x0A \x22company_code\x22: \x22TSC\x22,\x0A \x22product_type\x22: \x22airtime-ctg\x22,\x0A \x22host_type\x22: \x22android\x22\x0A },\x0A \x22id\x22: {\x0A \x22type\x22: \x22guest\x22,\x0A \x22group\x22: \x22guest\x22,\x0A \x22uuid\x22: \x22fd2dfcdc-ade2-11e6-8404-0242ac110003\x22,\x0A \x22device_id\x22: \x222f504f5ed3c64934\x22\x0A },\x0A \x22stats\x22: [\x0A {\x0A \x22timestamp\x22: \x222017-02-16T23:29:57+0000\x22,\x0A \x22software_id\x22: \x22A-ACTG\x22,\x0A \x22action_id\x22: \x22open_app\x22,\x0A \x22values\x22: {\x0A \x22device_id\x22: \x222f504f5ed3c64934\x22,\x0A \x22language\x22: \x22en\x22\x0A }\x0A }\x0A ]\x0A}"
Feb 17 07:10:07 afg-prod-web2 journal: afg-prod-web2 statistics: 192.168.28.12 - 200 - "{\x0A \x22identifier\x22: {\x0A \x22company_code\x22: \x22TSC\x22,\x0A \x22product_type\x22: \x22airtime-ctg\x22,\x0A \x22host_type\x22: \x22android\x22\x0A },\x0A \x22id\x22: {\x0A \x22type\x22: \x22guest\x22,\x0A \x22group\x22: \x22guest\x22,\x0A \x22uuid\x22: \x22fd2dfcdc-ade2-11e6-8404-0242ac110003\x22,\x0A \x22device_id\x22: \x222f504f5ed3c64934\x22\x0A },\x0A \x22stats\x22: [\x0A {\x0A \x22timestamp\x22: \x222017-02-16T23:29:57+0000\x22,\x0A \x22software_id\x22: \x22A-ACTG\x22,\x0A \x22action_id\x22: \x22open_app\x22,\x0A \x22values\x22: {\x0A \x22device_id\x22: \x222f504f5ed3c64934\x22,\x0A \x22language\x22: \x22en\x22\x0A }\x0A }\x0A ]\x0A}"
Feb 17 07:10:07 afg-prod-web2 journal: afg-prod-web2 statistics: 192.168.28.12 - 200 - "{\x0A \x22identifier\x22: {\x0A \x22company_code\x22: \x22TSC\x22,\x0A \x22product_type\x22: \x22airtime-ctg\x22,\x0A \x22host_type\x22: \x22android\x22\x0A },\x0A \x22id\x22: {\x0A \x22type\x22: \x22guest\x22,\x0A \x22group\x22: \x22guest\x22,\x0A \x22uuid\x22: \x22fd2dfcdc-ade2-11e6-8404-0242ac110003\x22,\x0A \x22device_id\x22: \x222f504f5ed3c64934\x22\x0A },\x0A \x22stats\x22: [\x0A {\x0A \x22timestamp\x22: \x222017-02-16T23:29:57+0000\x22,\x0A \x22software_id\x22: \x22A-ACTG\x22,\x0A \x22action_id\x22: \x22open_app\x22,\x0A \x22values\x22: {\x0A \x22device_id\x22: \x222f504f5ed3c64934\x22,\x0A \x22language\x22: \x22en\x22\x0A }\x0A }\x0A ]\x0A}"
Feb 17 07:10:07 afg-prod-web1 journal: afg-prod-web1 statistics: 192.168.28.12 - 200 - "{\x0A \x22identifier\x22: {\x0A \x22company_code\x22: \x22TSC\x22,\x0A \x22product_type\x22: \x22airtime-ctg\x22,\x0A \x22host_type\x22: \x22android\x22\x0A },\x0A \x22id\x22: {\x0A \x22type\x22: \x22guest\x22,\x0A \x22group\x22: \x22guest\x22,\x0A \x22uuid\x22: \x22fd2dfcdc-ade2-11e6-8404-0242ac110003\x22,\x0A \x22device_id\x22: \x222f504f5ed3c64934\x22\x0A },\x0A \x22stats\x22: [\x0A {\x0A \x22timestamp\x22: \x222017-02-16T23:29:57+0000\x22,\x0A \x22software_id\x22: \x22A-ACTG\x22,\x0A \x22action_id\x22: \x22open_app\x22,\x0A \x22values\x22: {\x0A \x22device_id\x22: \x222f504f5ed3c64934\x22,\x0A \x22language\x22: \x22en\x22\x0A }\x0A }\x0A ]\x0A}"]
我想提取date
即。 Feb 17 07:10:07
從每一行中將其放入數組中。
我嘗試應用一個for循環,但它給出了錯誤:
IndexError: list index out of range
我試過的代碼:
for i in splitdata:
abc = splitdata[logcount]
aa = abc.split()
if(aa[0] == "Feb"):
aaa = "".join([aa[0],' ',aa[1],' ',aa[2]])
logtime.append(aaa)
logcount += 2
else:
pass
print logtime
如果您的日志保存在名為log.log的文件中,則可以通過執行以下操作來獲取日期:
with open('log.log') as f:
log_time = []
for line in f:
log_time.append(line[:15])
print(log_time)
您只需檢查len(分割字符串)即可避免此類錯誤。 在改進代碼方面有很多范圍。
In [1]: sample_text = """Feb 17 07:10:07 afg-prod-web2 journal: afg-prod-web2 statistics: 192.168.28.12 - 200 - "{\x0A
...: \x22identifier\x22: {\x0A \x22company_code\x22: \x22TSC\x22,\x0A \x22product_type\x22: \x22airtime
...: -ctg\x22,\x0A \x22host_type\x22: \x22android\x22\x0A },\x0A \x22id\x22: {\x0A \x22type\x22: \
...: x22guest\x22,\x0A \x22group\x22: \x22guest\x22,\x0A \x22uuid\x22: \x22fd2dfcdc-ade2-11e6-8404-0242a
...: c110003\x22,\x0A \x22device_id\x22: \x222f504f5ed3c64934\x22\x0A },\x0A \x22stats\x22: [\x0A
...: {\x0A \x22timestamp\x22: \x222017-02-16T23:29:57+0000\x22,\x0A \x22software_id\x22: \x22A-A
...: CTG\x22,\x0A \x22action_id\x22: \x22open_app\x22,\x0A \x22values\x22: {\x0A
...: \x22device_id\x22: \x222f504f5ed3c64934\x22,\x0A \x22language\x22: \x22en\x22\x0A }\x0A
...: }\x0A ]\x0A}"""
In [2]: def get_time_from_log(log_text):
...: log_text_split = log_text.split(" ")
...: if len(log_text_split) < 3:
...: pass
...: elif log_text_split[0] == "Feb":
...: return " ".join(log_text_split[0:3])
...:
In [3]: get_time_from_log(sample_text)
Out[3]: 'Feb 17 07:10:07'
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.