简体   繁体   中英

Capture multiple lines using reg ex in python

I would like to write a regex that captures "my_code" line and the two lines that are indented unde it only

//abs[matches(@class,"her")] 
  //abs[matches(@class,"him")]

i was using my_code\n\s\s(.+)

my_code
  //abs[matches(@class,"her")] 
  //abs[matches(@class,"him")]
xxxx   //time
xxxxx   //h1

i was using my_code\n\s\s(.+)

my_code
  //abs[matches(@class,"her")] 
  //abs[matches(@class,"him")]
xxxx   //time
xxxxx   //h1

The \s matches a space and also a newline.

To make sure it is indentend, you might match 2 times the newline and 1 or more spaces or tabs [\t ]+ using a character class.

^my_code\r?\n[\t ]+.+\r?\n[ \t]+.+
  • ^ Start of string
  • my_code\r?\n match literally followed by a newline
  • [\t ]+ Match 1+ spaces or tabs
  • .+ Match 1+ times any char except a newline
  • \r?\n[ \t]+.+ Again match a newline, 1+ spaces or tabs and any char except a newline

Regex demo

To match the indented part 1 or more times, you could repeat a non capturing group and use a quantifier +

^my_code(?:\r?\n[\t ]+.+)+

Regex demo

I managed to get it working like this:

test_str  = """ 
    my_code
      //abs[matches(@class,"her")] 
      //abs[matches(@class,"him")]
    xxxx   //time
    xxxxx   //h1
"""
pattern = re.compile('my_code\n\s+[^\n]+\n\s+[^\n]+')
res = re.search(pattern, test_str)
print(res.group())

The [^\n]+ means match every character except new line and there should be 1 or more of these characters. This produces output like:

my_code
      //abs[matches(@class,"her")] 
      //abs[matches(@class,"him")]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM