Capture multiple lines using reg ex in python

Question

I would like to write a regex that captures "my_code" line and the two lines that are indented unde it only

//abs[matches(@class,"her")] 
  //abs[matches(@class,"him")]

i was using my_code\n\s\s(.+)

my_code
  //abs[matches(@class,"her")] 
  //abs[matches(@class,"him")]
xxxx   //time
xxxxx   //h1

i was using my_code\n\s\s(.+)

my_code
  //abs[matches(@class,"her")] 
  //abs[matches(@class,"him")]
xxxx   //time
xxxxx   //h1

Answer 1

The \s matches a space and also a newline.

To make sure it is indentend, you might match 2 times the newline and 1 or more spaces or tabs [\t ]+ using a character class.

^my_code\r?\n[\t ]+.+\r?\n[ \t]+.+

^ Start of string
my_code\r?\n match literally followed by a newline
[\t ]+ Match 1+ spaces or tabs
.+ Match 1+ times any char except a newline
\r?\n[ \t]+.+ Again match a newline, 1+ spaces or tabs and any char except a newline

Regex demo

To match the indented part 1 or more times, you could repeat a non capturing group and use a quantifier +

^my_code(?:\r?\n[\t ]+.+)+

Regex demo

Answer 2

I managed to get it working like this:

test_str  = """ 
    my_code
      //abs[matches(@class,"her")] 
      //abs[matches(@class,"him")]
    xxxx   //time
    xxxxx   //h1
"""
pattern = re.compile('my_code\n\s+[^\n]+\n\s+[^\n]+')
res = re.search(pattern, test_str)
print(res.group())

The [^\n]+ means match every character except new line and there should be 1 or more of these characters. This produces output like:

my_code
      //abs[matches(@class,"her")] 
      //abs[matches(@class,"him")]

Capture multiple lines using reg ex in python

Question

2 answers

solution1
0 ACCPTED 2019-10-10 14:04:32

solution2
0 2019-10-10 14:53:28

Capture multiple lines using reg ex in python

Question

2 answers

solution1 0 ACCPTED 2019-10-10 14:04:32

solution2 0 2019-10-10 14:53:28

solution1
0 ACCPTED 2019-10-10 14:04:32

solution2
0 2019-10-10 14:53:28