How to match several lines with regex

Question

Given a unicode object with the following text:

a
b
c
d
e

aaaa
bbbb
cccc
dddd
eeee

I'd like to get the second group of lines, in other words, every line after the blank one. This is the code I've used:

text = ... # the previous text
exp = u'a\nb\nc\nd\n\e\n{2}(.*\n){5}'
matches = re.findall(exp, text, re.U)

This will only retrieve the last line, indeed. What could I do to get the last five ones?

Answer 1

You're repeating the capturing group itself, which overwrites each match with the next repetition.

If you do this

exp = ur'a\nb\nc\nd\n\e\n{2}((?:.*\n){5})'

you get the five lines together.

You can't get to the individual matches unless you spell out the groups manually:

exp = ur'a\nb\nc\nd\n\e\n{2}(.*\n)(.*\n)(.*\n)(.*\n)(.*\n)'

Answer 2

Why not just:

text[text.index('\n\n') + 2:].splitlines()
# ['aaaa', 'bbbb', 'cccc', 'dddd', 'eeee']

Answer 3

if your searched text has some kind of limitation on the number of characters for this first part which you don't want, why not set a search for only words with more than X letters like:

^[a-z]{2,}

This will get every word bigger than 2 characters.

You can control as:

{3} Exactly 3 occurrences;
{6,} At least 6 occurrences;
{2,5} 2 to 5 occurrences.

How to match several lines with regex

Question

3 answers

solution1
4 ACCPTED 2013-07-16 13:33:19

solution2
2 2013-07-16 13:37:02

solution3
0 2013-07-16 13:40:33

How to match several lines with regex

Question

3 answers

solution1 4 ACCPTED 2013-07-16 13:33:19

solution2 2 2013-07-16 13:37:02

solution3 0 2013-07-16 13:40:33

solution1
4 ACCPTED 2013-07-16 13:33:19

solution2
2 2013-07-16 13:37:02

solution3
0 2013-07-16 13:40:33