从正则表达式组中排除某些字符

Question

I have a text that contains many articles concatenated into a single string.我有一个文本，其中包含许多连接成单个字符串的文章。 Each new article starts with = Article 1 = followed by = = Article 1 Section 1 = = , = = Article 1 Section 2 = = and so on.每篇新文章都以= Article 1 =开头，然后是= = Article 1 Section 1 = = ， = = Article 1 Section 2 = =等等。 I want to split this string and create a string for each article.我想拆分这个字符串并为每篇文章创建一个字符串。

For that I am using regex split为此，我正在使用正则表达式拆分

import re
pattern = "=[\s\w\'\(\)]+="
l = re.compile(pattern).split(test_data)

But this isn't giving me the desired result.但这并没有给我想要的结果。 The article is splitting on sections and subsections as well.这篇文章也分为部分和小节。 I tried excluding multiple = s from matching but didn't find any success and not sure how to proceed on that.我尝试从匹配中排除多个= s，但没有发现任何成功，也不知道如何继续。 I have pasted sample data(two articles) here - Robert Boulder and Kiss You ( One Direction song )我在这里粘贴了示例数据（两篇文章） - Robert Boulder和Kiss You ( One Direction song )

Answer 1

This regex should do the job:这个正则表达式应该可以完成这项工作：

^ *\= [^\=]* \= *$

See it working here:看到它在这里工作：

https://regex101.com/r/HJPHFA/1 https://regex101.com/r/HJPHFA/1

Basically matching a '=' followed by a space, any numbers of characters that are NOT '=' (the [^\=] part), then another space and another '='.基本上匹配一个'='后跟一个空格，任意数量的不是'='的字符（ [^\=]部分），然后是另一个空格和另一个'='。 Also includes optional spaces at the start and end of the line because your sample text has leading and trailing spaces on some lines.还包括在行首和行尾的可选空格，因为您的示例文本在某些行上有前导和尾随空格。

从正则表达式组中排除某些字符

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-12-31 06:48:54

从正则表达式组中排除某些字符

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-12-31 06:48:54

解决方案1
2 已采纳 2021-12-31 06:48:54