简体   繁体   中英

Groups match with regex

I have something like this :

Some info_a
Useless info
sub_info_a
Useless info

Some info_b
Useless info
sub_info_b_1
sub_info_b_2
Useless info

Some info_c
Useless info
sub_info_c
Useless info

I want to create groups so that I can have something like this :

(info_a, sub_info_a), (info_b, sub_info_b_1, sub_info_b_2), (info_c, sub_info_c)

I tried :

^Some (info_\w+) .*$\n.*$\n(?:^(sub_info_\w+) .*$\n)+

But it only matches the last sub_info_b

(info_a, sub_info_a), (info_b, sub_info_b_2), (info_c, sub_info_c)

I also tried :

^Some (info_\w+) .*$\n.*$\n|^(sub_info_\w+) .*$\n

This one gave me :

('info_a', ''), ('', 'sub_info_a'), ('info_b', ''), ('', 'sub_info_b_1'), ('', 'sub_info_b_2'), ('info_c', ''), ('', 'sub_info_c')

Which is not really what I want. Note that sub_info can appear more than once or twice.

^Some (info_\w+).*\n.*\n((?:^sub_info_\w+.*\n)+)

The capture group should be around the quantified non-capture group. When you quantify a capture group, it just captures the last occurrence. So you need to put a group around that to capture all the repetitions.

Note that this will not put each repetition into separate groups in the result -- there's always a 1-to-1 correspondence between capture groups and .group(n) items in the result. You need to split up the second capture group when processing the results of the regexp.

I've also removed the space before .* , and there's no need for both \\n and $ .

DEMO

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM