简体   繁体   中英

python regex match to find multi-matching on multi line

I spend three days now on writing regular expression for matching multi-string on multiline. I have file has text as follow:

[pid 20242] 23:13:36 futex(0x7f8087eb18, FUTEX_WAKE, 1 <unfinished ...>
[pid   621] 23:13:36 futex(0x7f80855410, FUTEX_WAIT, 1041, NULL <unfinished ...>
[pid 20242] 23:13:36 <... futex resumed> ) = 0
[pid   621] 23:13:36 <... futex resumed> ) = -1 EAGAIN (Try again)
[pid 20242] 23:13:36 munmap(0x7f80200000, 8192 <unfinished ...>
--> [pid   621] 23:13:36 openat(AT_FDCWD, "/proc/self/task", O_RDONLY|O_DIRECTORY|O_CLOEXEC <unfinished ...>
[pid 20242] 23:13:36 <... munmap resumed> ) = 0
[pid   621] 23:13:36 <... openat resumed> ) = 13
[pid 20242] 23:13:36 madvise(0x7f76a7b000, 20480, MADV_DONTNEED <unfinished ...>
 --> [pid   621] 23:13:36 hammad(13, FUTEX_WAKE, 1, 24 )
[pid 20242] 23:13:36 madvise(0x7f76a7b000, 20480, MADV_DONTNEED <unfinished ...>
[pid   621] 23:13:36 <... futex resumed> ) = 0
[pid 20242] 23:13:36 futex(0x7f80855410, FUTEX_WAKE, 1 <unfinished ...>
[pid   621] 23:13:36 futex(0x7f8087eb18, FUTEX_WAKE, 1 <unfinished ...>
[pid 20242] 23:13:36 <... futex resumed> ) = 0
[pid   621] 23:13:36 <... futex resumed> ) = 0
[pid 20242] 23:13:36 futex(0x7f8087eb18, FUTEX_WAKE, 1 <unfinished ...>
[pid   621] 23:13:36 futex(0x7f80855410, FUTEX_WAIT, 1041, NULL <unfinished ...>
[pid 20242] 23:13:36 <... futex resumed> ) = 0

I added "-->" For clarity on above for which string I looked for. however, I need to found if there is pattern where "openat(AT_FDCWD, "/proc/self/task", O_RDONLY|O_DIRECTORY|O_CLOEXEC " is followed by hammad(13, FUTEX_WAKE, 1, 24 ). it can be there ae multi-line between them but the important things there are "openat" followed by "hammad" functions.

I have a lot of files with different texts but I want to use the same pattern for matching. Here is my code:

text = open('textfile.txt').read()


if re.findall(r"[a-zA-Z\s.-]*openat(AT_FDCWD, "/proc/self/task", 
 O_RDONLY|O_DIRECTORY|O_CLOEXEC <unfinished ...>([a-zA-Z0-9|\s|.])*hammad(13, FUTEX_WAKE, 1, 24 )", text):

    print 'found a match!'
else:
    print 'no match' 

Can anyone help me to fix my code? thanks

You can use:

/^.*openat(?:[\s\S]*)^.*hammad.*/gm

The key elements are:

  1. The m flag for multiline;
  2. Knowing that .* matches any horizontal character but does not match a newline (unless you use the s flag);
  3. [\\s\\S]* matches any character including a newline.

Demo

Python demo:

>>> re.findall(r"^.*openat(?:[\s\S]*)^.*hammad.*", txt, re.M)
['--> [pid   621] 23:13:36 openat(AT_FDCWD, "/proc/self/task", O_RDONLY|O_DIRECTORY|O_CLOEXEC <unfinished ...>\n[pid 20242] 23:13:36 <... munmap resumed> ) = 0\n[pid   621] 23:13:36 <... openat resumed> ) = 13\n[pid 20242] 23:13:36 madvise(0x7f76a7b000, 20480, MADV_DONTNEED <unfinished ...>\n --> [pid   621] 23:13:36 hammad(13, FUTEX_WAKE, 1, 24 )']

If you want to add literal qualifying characters after hammad just remember to escape any regex meta characters.

Ultimately, an answer like @dawg had above is clearly much simpler in terms of the regex.

With that said, as a general Python and Regex learning opportunity, there are a few things in your example that stood out as causing the error.

  1. You have double quotes " in your pattern, but also as the character telling python to start the string. So you can either escape those \\" or use a different starting character for the string representation in Python. Either single quote ' or triple quotes """ would work in this case.
  2. You are using several regex metacharacters in your pattern. Things like [ ( | have special meaning in regular expressions. You need to escape those as well, by leading them with \\, so | becomes \\| if you want it to be used literally instead of the special regex meaning.

@dawg implicitely fixed all these things by writing a more concise regex without these problems, but I think that's the root of your issue you posted.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM