简体   繁体   中英

find pattern on multiline python

I spent two days tried to build regular expression to find two words/numbers occur Sequentially on different lines. I has a file that has text like:

  1  [pid 29743] 18:58:19 prctl(PR_CAPBSET_DROP, 0x9, 0, 0, 0 <unfinished ...>
  2  [pid 29746] 18:58:19 <... mprotect resumed> ) = 0
  3  [pid 29743] 18:58:19 <... prctl resumed> ) = 0
  4  [pid   615] 18:58:19 <... ioctl resumed> , 0xffffffffffb4f054) = 0
  5  [pid 29743] 18:58:19 prctl(PR_CAPBSET_READ, 0xa, 0, 0, 0 <unfinished ...>
  6  [pid   615] 18:58:19 ioctl(13, 0x40047703 <unfinished ...>
  7  [pid 29743] 18:58:19 <... prctl resumed> ) = 1
  8  [pid 29746] 18:58:19 mprotect(0xfffffffff4ae2000, 4096, PROT_NONE <unfinished ...>
  9  [pid 29743] 18:58:19 prctl(PR_CAPBSET_DROP, 0xa, 0, 0, 0 <unfinished ...>
  10 [pid   615] 18:58:19 <... ioctl resumed> , 0x7fd19062e0) = 0
  11 [pid 29743] 18:58:19 <... prctl resumed> ) = 0
  12 [pid 29746] 18:58:19 <... mprotect resumed> ) = 0
  13 [pid 29743] 18:58:19 prctl(PR_CAPBSET_READ, 0xb, 0, 0, 0 <unfinished ...>
  14 [pid 29746] 18:58:19 ioctl(13, 0x40047703, 
  <unfinished ...>
  15 [pid 29743] 18:58:19 <... prctl resumed> ) = 1
  16 [pid   615] 18:58:19 <... ioctl resumed> , 0x7fd19064b0) = 0

I am looking for two values 0x7fd19062e0 and 0x7fd19064b0 that has appeared Sequentially on text. They have appeared at line 10 and 16. I want to build regular expression that tells me if appeared or not Sequentially Here is my code

    file = open("text.txt", a+)
    for line in file:
        text += line
    if re.findall(r"^.*0x7fd19062e0.*0x7fd19064b0", text, re.M):
                       print 'found a match!'
                    else:
                       print 'no match'

re.M modifies the behavior of ^ and $ anchors. For the "dot matches newline" option, you need re.S . Also, if you just want to find if there is a match, don't use re.findall() :

file = open("text.txt")  # why append mode?
text = file.read()       # no need to build the string line by line
if re.search(r"\b0x7fd19062e0\b.*\b0x7fd19064b0\b", text, re.S):
     print 'found a match!'
else:
     print 'no match' 

Note that I added word boundary anchors to ensure that only entire hex numbers are matched (otherwise, submatches of longer numbers would be possible). This may or may not be relevant in your case, but it's probably good practice.

No need for RE:

f = open('text.txt')
numerated_lines = enumerate(f.readlines())
lines_with_pattern = filter(lambda l: '0x7fd19062e0' in l[1], enumerated_lines)
pairs = zip(lines, lines[1:])
result = filter(lambda p: p[0][0]+1 == p[1][0], pairs)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM