简体   繁体   English

Python 中的模式匹配 - 从文件中提取和存储字符串

[英]Pattern Matching in Python - Extract and Store strings from file

I have the following log file:我有以下日志文​​件:

*** 2018-09-14T12:36:39.560671+02:00 (DB_NAME)
*** SESSION ID:(12345) 2018-09-14T12:36:39.560750+02:00
*** CLIENT ID:() 2018-09-14T12:36:39.560774+02:00
*** SERVICE NAME:(DB_NAME) 2018-09-14T12:36:39.560798+02:00
*** MODULE NAME:(mod_name_action (TNS V1-V3)) 2018-09-14T12:36:39.560822+02:00
*** ACTION NAME:() 2018-09-14T12:36:39.560848+02:00
*** CLIENT DRIVER:() 2018-09-14T12:36:39.560875+02:00
*** CONTAINER ID:(1) 2018-09-14T12:36:39.560926+02:00

I would like to store the MODULE_NAME value, extracted from this line:我想存储从这一行中提取的 MODULE_NAME 值:

*** MODULE NAME:(mod_name_action (TNS V1-V3)) 2018-09-14T12:36:39.560822+02:00

ie just this:即只是这个:

mod_name_action (TNS V1-V3)

I have to do that using python.我必须使用python来做到这一点。 I am trying with something like:我正在尝试类似的东西:

log_i=open(logname,"r")
    for line_of_log in log_i:
       #search the MODULE
       module = "MODULE NAME:("
       str_found_at = line_of_log.find(module)
       if str_found_at != -1: 
          regex = r"MODULE NAME:([a-zA-Z]+)"
          MODULE = re.findall(regex, line_of_log)
          print "MODULE_A==>", MODULE  

    log_i.close()

But it doesn't work.但它不起作用。 Can someone help me?有人能帮我吗?

Using Regex.使用正则表达式。

Demo:演示:

import re

s = """*** 2018-09-14T12:36:39.560671+02:00 (DB_NAME)
*** SESSION ID:(12345) 2018-09-14T12:36:39.560750+02:00
*** CLIENT ID:() 2018-09-14T12:36:39.560774+02:00
*** SERVICE NAME:(DB_NAME) 2018-09-14T12:36:39.560798+02:00
*** MODULE NAME:(mod_name_action (TNS V1-V3)) 2018-09-14T12:36:39.560822+02:00
*** ACTION NAME:() 2018-09-14T12:36:39.560848+02:00
*** CLIENT DRIVER:() 2018-09-14T12:36:39.560875+02:00
*** CONTAINER ID:(1) 2018-09-14T12:36:39.560926+02:00"""

res = []
for line in s.splitlines():
    m = re.search(r"(?<=MODULE NAME:\()(.*?)(?=\)\))", line)
    if m:
        res.append(m.group()+")")
print(res)

Output:输出:

['mod_name_action (TNS V1-V3)']

You can do this without regex.您可以在没有正则表达式的情况下执行此操作。 I'll put your log data into a list of lines (retaining the newlines) using the .splitlines method so we can loop over it like it was a file.我将使用.splitlines方法将您的日志数据放入行列表(保留换行符)中,这样我们就可以像文件一样遍历它。

We can use in to find lines containing "MODULE NAME:", and then we just need to search for the first '(' and the last ')' on that line so we can slice out the substring containing the name.我们可以使用in来查找包含“MODULE NAME:”的行,然后我们只需要搜索该行的第一个 '(' 和最后一个 ')' 以便我们可以切出包含名称的子字符串。

log_i = '''\
*** 2018-09-14T12:36:39.560671+02:00 (DB_NAME)
*** SESSION ID:(12345) 2018-09-14T12:36:39.560750+02:00
*** CLIENT ID:() 2018-09-14T12:36:39.560774+02:00
*** SERVICE NAME:(DB_NAME) 2018-09-14T12:36:39.560798+02:00
*** MODULE NAME:(mod_name_action (TNS V1-V3)) 2018-09-14T12:36:39.560822+02:00
*** ACTION NAME:() 2018-09-14T12:36:39.560848+02:00
*** CLIENT DRIVER:() 2018-09-14T12:36:39.560875+02:00
*** CONTAINER ID:(1) 2018-09-14T12:36:39.560926+02:00
'''.splitlines(True)

for line_of_log in log_i:
    #search for the MODULE NAME line
    if "MODULE NAME:" in line_of_log:
        # Find the location of the first '('
        start = line_of_log.index('(')
        # Find the location of the last ')'
        end = line_of_log.rindex(')')
        modname = line_of_log[start+1:end]
        print "MODULE_A==>", modname

output输出

MODULE_A==> mod_name_action (TNS V1-V3)

If there is only one "MODULE NAME:" line in the log (or you only want to print the first one if there are multiples) then you should put a break after the print statement so that you don't waste time checking all the following lines in the file.如果日志中只有一个“MODULE NAME:”行(或者如果有多个,您只想打印第一个)那么您应该在print语句后放置一个break ,这样您就不会浪费时间检查所有文件中的以下几行。

It doesn't work because your regex pattern is incorrect: special characters like '_' and '-' aren't matched by the pattern '[a-zA-Z]+'.它不起作用,因为您的正则表达式模式不正确:像“_”和“-”这样的特殊字符与模式“[a-zA-Z]+”不匹配。 Plus, if you want to get rid of the parenthesis you have to include them in your pattern using the '\\' escape character.另外,如果您想去掉括号,您必须使用“\\”转义字符将它们包含在您的模式中。 Finally, instead of using最后,而不是使用

 str_found_at = line_of_log.find(module)

you can search directly a substring in a string in python.您可以在python中直接搜索字符串中的子字符串。 Finally, I would recommend the following code:最后,我会推荐以下代码:

log_i=open(logname,"r")
for line_of_log in log_i:
   #search the MODULE
   module = "MODULE NAME:("
   if module in line_of_log:
      regex = r"MODULE NAME:\((.+)\)"
      MODULE = re.findall(regex, line_of_log)
      print "MODULE_A==>", MODULE[0]
log_i.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM