Groups match with regex

Question

I have something like this :

Some info_a
Useless info
sub_info_a
Useless info

Some info_b
Useless info
sub_info_b_1
sub_info_b_2
Useless info

Some info_c
Useless info
sub_info_c
Useless info

I want to create groups so that I can have something like this :

(info_a, sub_info_a), (info_b, sub_info_b_1, sub_info_b_2), (info_c, sub_info_c)

I tried :

^Some (info_\w+) .*$\n.*$\n(?:^(sub_info_\w+) .*$\n)+

But it only matches the last sub_info_b

(info_a, sub_info_a), (info_b, sub_info_b_2), (info_c, sub_info_c)

I also tried :

^Some (info_\w+) .*$\n.*$\n|^(sub_info_\w+) .*$\n

This one gave me :

('info_a', ''), ('', 'sub_info_a'), ('info_b', ''), ('', 'sub_info_b_1'), ('', 'sub_info_b_2'), ('info_c', ''), ('', 'sub_info_c')

Which is not really what I want. Note that sub_info can appear more than once or twice.

Answer 1

^Some (info_\w+).*\n.*\n((?:^sub_info_\w+.*\n)+)

The capture group should be around the quantified non-capture group. When you quantify a capture group, it just captures the last occurrence. So you need to put a group around that to capture all the repetitions.

Note that this will not put each repetition into separate groups in the result -- there's always a 1-to-1 correspondence between capture groups and .group(n) items in the result. You need to split up the second capture group when processing the results of the regexp.

I've also removed the space before .* , and there's no need for both \\n and $ .

DEMO

Groups match with regex

Question

1 answers

solution1
1 ACCPTED 2021-07-26 16:56:22

Groups match with regex

Question

1 answers

solution1 1 ACCPTED 2021-07-26 16:56:22

solution1
1 ACCPTED 2021-07-26 16:56:22