简体   繁体   English

从python文件中的两个关键字之间获取字符串

[英]Getting strings in between two keywords from a file in python

I tried to get some string between two keywords from a large text file with the following pattern searching each line by line and print it as well as store in another text file 我试图从大型文本文件中的两个关键字之间获取一些字符串,其格式如下,逐行搜索并打印并存储在另一个文本文件中

'Event_WheelMonitorReleased' (253) 'Event_WheelMonitorPressed' (252) 'Event_WheelMonitorPressed' (252) 'Event_WheelMonitorPressed' (252) 'Event_WheelMonitorPressed'(253)'Event_WheelMonitorPressed'(252)'Event_WheelMonitorPressed'(252)'Event_WheelMonitorPressed'(252)

Here I would like to extract only the strings inbetween EVENT() 在这里,我只想提取EVENT()之间的字符串

Here I would say I need X_0_Gui_Menu_610_Menu_Status_System 在这里我会说我需要X_0_Gui_Menu_610_Menu_Status_System

I tried the following code 我尝试了以下代码

def get_navigated_pages():
    os.chdir('log_file')
    log_file = open('messages','r')
    data = log_file.read()
    navigated_pages = re.findall(r'EVENT(X(.*?)) ',data,re.DOTALL|re.MULTILINE)
    with open('navigated_page_file', 'w') as navigated_page_file:
         navigated_page_file.write(navigated_pages)

I expected the output in the text file to be something like this 我希望文本文件中的输出是这样的

X_0_Gui_Menu_650_Menu_Status_Version 
X_0_Gui_Menu_610_Menu_Status_System 
X_0_Gui_Menu_670_Menu_Status_Media

As mentioned above I would like to get the output only which is starting with X_0 and ignoring starting with other keywords 如上所述,我只想获取以X_0开头而忽略以其他关键字开头的输出

Try escaping your outermost parentheses pair. 尝试转义最外面的括号对。

navigated_pages = re.findall(r'EVENT\(X(.*?)\) ',data,re.DOTALL|re.MULTILINE)

This appears to make it match properly, at least for my little sample input: 这似乎使其正确匹配,至少对于我的小样本输入而言:

>>> s = "EVENT(X_HELLO) ... EVENT(X_HOW_ARE_YOU_DOING_TODAY)... EVENT(this one shouldn't appear because it doesn't start with X)"
>>> re.findall(r"EVENT\(X(.*?)\)", s)
['_HELLO', '_HOW_ARE_YOU_DOING_TODAY']

If you want the starting X too, you should nudge the inner parentheses to the left by one. 如果您也想要起始X,则应将内部括号向左微移一个。 Don't worry, I'm pretty sure the *? 不用担心,我很确定*? will still have the proper precedence. 仍将具有适当的优先级。

>>> re.findall(r"EVENT\((X.*?)\)", s)
['X_HELLO', 'X_HOW_ARE_YOU_DOING_TODAY']

might get away with using split: 可能会因为使用split而脱身:

s = "Jan 01 08:11:13 AMIRA-134500021 user.notice gui-monitor[770]: ACTION:401b0836:8:EVENT(X_0_Gui_Menu_610_Menu_Status_System) 'Event_WheelMonitorReleased' (253)"
print(s.split("EVENT(")[1].rsplit(") ",1)[0])
X_0_Gui_Menu_610_Menu_Status_System

with open('message','r') as log_file:
    for line in log_file:
        print(line.split("EVENT(")[1].rsplit(") ",1)[0])

X_0_Gui_Menu_610_Menu_Status_System
X_0_Gui_Menu_610_Menu_Status_System
global_ExportActive_Popup
global_FileOverwrite_Confirm_Popup
global_Global_Reactions

To get only X_ lines: 仅获取X_行:

with open('message','r') as log_file:
    for line in log_file:
        chk = line.split("EVENT(")[1].rsplit(") ",1)[0]
        if chk.startswith("X_"):
            print(chk)
X_0_Gui_Menu_610_Menu_Status_System
X_0_Gui_Menu_610_Menu_Status_System

If you are confident X_ only appears in the lines you want: 如果您确信X_仅出现在您想要的行中:

 for line in log_file:
    if "X_" in line:
        chk = line.split("EVENT(")[1].rsplit(") ",1)[0]
        print(chk)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM