Exclude some characters from a regex group

Question

I have a text that contains many articles concatenated into a single string. Each new article starts with = Article 1 = followed by = = Article 1 Section 1 = = , = = Article 1 Section 2 = = and so on. I want to split this string and create a string for each article.

For that I am using regex split

import re
pattern = "=[\s\w\'\(\)]+="
l = re.compile(pattern).split(test_data)

But this isn't giving me the desired result. The article is splitting on sections and subsections as well. I tried excluding multiple = s from matching but didn't find any success and not sure how to proceed on that. I have pasted sample data(two articles) here - Robert Boulder and Kiss You ( One Direction song )

Answer 1

This regex should do the job:

^ *\= [^\=]* \= *$

See it working here:

https://regex101.com/r/HJPHFA/1

Basically matching a '=' followed by a space, any numbers of characters that are NOT '=' (the [^\=] part), then another space and another '='. Also includes optional spaces at the start and end of the line because your sample text has leading and trailing spaces on some lines.

Exclude some characters from a regex group

Question

1 answers

solution1
2 ACCPTED 2021-12-31 06:48:54

Exclude some characters from a regex group

Question

1 answers

solution1 2 ACCPTED 2021-12-31 06:48:54

solution1
2 ACCPTED 2021-12-31 06:48:54