簡體   English   中英

如何從字符串中提取特定單詞?

[英]how to extract the specific word from a string?

我有一個包含多行的文件,想要提取每行的前三個單詞。

str = []

str = [
Feb 17 07:10:07 afg-prod-web2 journal: afg-prod-web2 statistics: 192.168.28.12 - 200 - "{\x0A    \x22identifier\x22: {\x0A        \x22company_code\x22: \x22TSC\x22,\x0A        \x22product_type\x22: \x22airtime-ctg\x22,\x0A        \x22host_type\x22: \x22android\x22\x0A    },\x0A    \x22id\x22: {\x0A        \x22type\x22: \x22guest\x22,\x0A        \x22group\x22: \x22guest\x22,\x0A        \x22uuid\x22: \x22fd2dfcdc-ade2-11e6-8404-0242ac110003\x22,\x0A        \x22device_id\x22: \x222f504f5ed3c64934\x22\x0A    },\x0A    \x22stats\x22: [\x0A        {\x0A            \x22timestamp\x22: \x222017-02-16T23:29:57+0000\x22,\x0A            \x22software_id\x22: \x22A-ACTG\x22,\x0A            \x22action_id\x22: \x22open_app\x22,\x0A            \x22values\x22: {\x0A                \x22device_id\x22: \x222f504f5ed3c64934\x22,\x0A                \x22language\x22: \x22en\x22\x0A            }\x0A        }\x0A    ]\x0A}"

Feb 17 07:10:07 afg-prod-web2 journal: afg-prod-web2 statistics: 192.168.28.12 - 200 - "{\x0A    \x22identifier\x22: {\x0A        \x22company_code\x22: \x22TSC\x22,\x0A        \x22product_type\x22: \x22airtime-ctg\x22,\x0A        \x22host_type\x22: \x22android\x22\x0A    },\x0A    \x22id\x22: {\x0A        \x22type\x22: \x22guest\x22,\x0A        \x22group\x22: \x22guest\x22,\x0A        \x22uuid\x22: \x22fd2dfcdc-ade2-11e6-8404-0242ac110003\x22,\x0A        \x22device_id\x22: \x222f504f5ed3c64934\x22\x0A    },\x0A    \x22stats\x22: [\x0A        {\x0A            \x22timestamp\x22: \x222017-02-16T23:29:57+0000\x22,\x0A            \x22software_id\x22: \x22A-ACTG\x22,\x0A            \x22action_id\x22: \x22open_app\x22,\x0A            \x22values\x22: {\x0A                \x22device_id\x22: \x222f504f5ed3c64934\x22,\x0A                \x22language\x22: \x22en\x22\x0A            }\x0A        }\x0A    ]\x0A}"

Feb 17 07:10:07 afg-prod-web2 journal: afg-prod-web2 statistics: 192.168.28.12 - 200 - "{\x0A    \x22identifier\x22: {\x0A        \x22company_code\x22: \x22TSC\x22,\x0A        \x22product_type\x22: \x22airtime-ctg\x22,\x0A        \x22host_type\x22: \x22android\x22\x0A    },\x0A    \x22id\x22: {\x0A        \x22type\x22: \x22guest\x22,\x0A        \x22group\x22: \x22guest\x22,\x0A        \x22uuid\x22: \x22fd2dfcdc-ade2-11e6-8404-0242ac110003\x22,\x0A        \x22device_id\x22: \x222f504f5ed3c64934\x22\x0A    },\x0A    \x22stats\x22: [\x0A        {\x0A            \x22timestamp\x22: \x222017-02-16T23:29:57+0000\x22,\x0A            \x22software_id\x22: \x22A-ACTG\x22,\x0A            \x22action_id\x22: \x22open_app\x22,\x0A            \x22values\x22: {\x0A                \x22device_id\x22: \x222f504f5ed3c64934\x22,\x0A                \x22language\x22: \x22en\x22\x0A            }\x0A        }\x0A    ]\x0A}"

Feb 17 07:10:07 afg-prod-web1 journal: afg-prod-web1 statistics: 192.168.28.12 - 200 - "{\x0A    \x22identifier\x22: {\x0A        \x22company_code\x22: \x22TSC\x22,\x0A        \x22product_type\x22: \x22airtime-ctg\x22,\x0A        \x22host_type\x22: \x22android\x22\x0A    },\x0A    \x22id\x22: {\x0A        \x22type\x22: \x22guest\x22,\x0A        \x22group\x22: \x22guest\x22,\x0A        \x22uuid\x22: \x22fd2dfcdc-ade2-11e6-8404-0242ac110003\x22,\x0A        \x22device_id\x22: \x222f504f5ed3c64934\x22\x0A    },\x0A    \x22stats\x22: [\x0A        {\x0A            \x22timestamp\x22: \x222017-02-16T23:29:57+0000\x22,\x0A            \x22software_id\x22: \x22A-ACTG\x22,\x0A            \x22action_id\x22: \x22open_app\x22,\x0A            \x22values\x22: {\x0A                \x22device_id\x22: \x222f504f5ed3c64934\x22,\x0A                \x22language\x22: \x22en\x22\x0A            }\x0A        }\x0A    ]\x0A}"]

我想提取date即。 Feb 17 07:10:07從每一行中將其放入數組中。

我嘗試應用一個for循環,但它給出了錯誤:

IndexError: list index out of range

我試過的代碼:

for i in splitdata:
            abc  = splitdata[logcount]
            aa = abc.split()
            if(aa[0] == "Feb"):
                aaa = "".join([aa[0],' ',aa[1],' ',aa[2]])
                logtime.append(aaa)
                logcount += 2   
            else:
                pass
        print logtime

如果您的日志保存在名為log.log的文件中,則可以通過執行以下操作來獲取日期:

with open('log.log') as f: 
    log_time = []
    for line in f:
        log_time.append(line[:15])
print(log_time) 

您只需檢查len(分割字符串)即可避免此類錯誤。 在改進代碼方面有很多范圍。

  • 可重復使用的使用方法
  • 按索引訪問之前檢查列表的len
  • 您無需在python中的if條件使用方括號
  • 聰明地使用列表理解
  • 您用來加入列表的代碼表明您需要在python中學習很多東西。 祝你好運!
In [1]: sample_text = """Feb 17 07:10:07 afg-prod-web2 journal: afg-prod-web2 statistics: 192.168.28.12 - 200 - "{\x0A
   ...:  \x22identifier\x22: {\x0A        \x22company_code\x22: \x22TSC\x22,\x0A        \x22product_type\x22: \x22airtime
   ...: -ctg\x22,\x0A        \x22host_type\x22: \x22android\x22\x0A    },\x0A    \x22id\x22: {\x0A        \x22type\x22: \
   ...: x22guest\x22,\x0A        \x22group\x22: \x22guest\x22,\x0A        \x22uuid\x22: \x22fd2dfcdc-ade2-11e6-8404-0242a
   ...: c110003\x22,\x0A        \x22device_id\x22: \x222f504f5ed3c64934\x22\x0A    },\x0A    \x22stats\x22: [\x0A
   ...: {\x0A            \x22timestamp\x22: \x222017-02-16T23:29:57+0000\x22,\x0A            \x22software_id\x22: \x22A-A
   ...: CTG\x22,\x0A            \x22action_id\x22: \x22open_app\x22,\x0A            \x22values\x22: {\x0A
   ...: \x22device_id\x22: \x222f504f5ed3c64934\x22,\x0A                \x22language\x22: \x22en\x22\x0A            }\x0A
   ...:         }\x0A    ]\x0A}"""

In [2]: def get_time_from_log(log_text):
   ...:     log_text_split = log_text.split(" ")
   ...:     if len(log_text_split) < 3:
   ...:         pass
   ...:     elif log_text_split[0] == "Feb":
   ...:         return " ".join(log_text_split[0:3])
   ...:

In [3]: get_time_from_log(sample_text)
Out[3]: 'Feb 17 07:10:07'

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM