Python 正则表达式查找括号内的所有内容，事先带有前缀

Question

This seems like a fairly simple issue, but I can't get it to work.这似乎是一个相当简单的问题，但我无法让它发挥作用。

I have a text file, which contains JSON like data, but there are a couple of additional lines, stopping it being a valid JSON and I need to remove these.我有一个文本文件，其中包含类似 JSON 的数据，但还有几行额外的行，阻止它成为有效的 JSON，我需要删除这些行。 This sounds very simple and even more so, as the valid JSON strings (which I can parse later) are always contained in the following container:这听起来非常简单，甚至更简单，因为有效的 JSON 字符串（我可以稍后解析）始终包含在以下容器中：

xyz() xyz()

So for example, the dataset will be something like:例如，数据集将类似于：

abcdefg
xyz({"id_value": 123, "text_value": "efg"})

abcdefg
xyz({"id_value": 124, "text_value": "hij"})

Each separate JSON string is always prefixed by abcdefg and then xyz( and there is always a closing bracket after. So the format is consistent.每个单独的 JSON 字符串总是以 abcdefg 为前缀，然后是 xyz( 并且后面总是有一个右括号。所以格式是一致的。

I was trying the following:我正在尝试以下操作：

re.findall(r'xyz\(.*?\)', text_file)

However despite attempting variations of this (eg using re.search, trying \\w+ etc.) nothing seems to work (by which I mean it returns an empty list).然而，尽管尝试了这种变化（例如使用 re.search，尝试 \\w+ 等）似乎没有任何效果（我的意思是它返回一个空列表）。

If I just try to do the following:如果我只是尝试执行以下操作：

re.findall(r'xyz\(

Then it returns:然后它返回：

['xyz(', 'xyz(']

As expected.正如预期的那样。

So the issue appears to be with the string in the brackets, but I can not work out what the problem is, as other examples on here suggest my code is correct (which it can't be as it doesn't work)!所以问题似乎与括号中的字符串有关，但我无法弄清楚问题是什么，因为这里的其他示例表明我的代码是正确的（它不可能是因为它不起作用）！

I presume its something horrifically simple, but I'm a bit stuck!我认为它的东西非常简单，但我有点卡住了！

Answer 1

You can install PyPi regex module by rinning pip install regex (or pip3 install regex ) and then using this library to match strings between xyz( and the next paired ) char using:您可以通过 rinning pip install regex （或pip3 install regex ）然后使用此库来匹配xyz(和下一个配对)字符之间的字符串，使用以下方法来安装 PyPi regex模块：

import regex 
#...
output = [x.group() for x in regex.finditer(r'xyz(\((?:[^()]++|(?1))*\))', text_file)

The list comprehension is used to avoid the issue with regex.findall when only captured substrings are returned when a capturing group is defined in the regex (and here, the capturing group around parentheses is required since it is recursed inside the pattern with a (?1) subroutine.当在正则表达式中定义捕获组时仅返回捕获的子字符串时，列表regex.findall用于避免regex.findall的问题（在这里，括号周围的捕获组是必需的，因为它在模式中使用(?1)子程序。

Pattern details :图案详情：

xyz - xyz text xyz - xyz文本
(\\((?:[^()]++|(?1))*\\)) - Group 1: (\\((?:[^()]++|(?1))*\\)) - 第 1 组：
- \\( - a ( char \\( - a (字符
- (?:[^()]++|(?1))* - zero or more repetitions of one or more chars other than ( and ) or the subroutine repeats (recurses) the whole Group 1 pattern (?:[^()]++|(?1))* - 除(和)之外的一个或多个字符的零次或多次重复或子程序重复（递归）整个第 1 组模式
- \\) - a ) char. \\) - a )字符。

Python 正则表达式查找括号内的所有内容，事先带有前缀

问题描述

1 个解决方案

解决方案1
0 2021-11-08 22:22:46

Python 正则表达式查找括号内的所有内容，事先带有前缀

问题描述

1 个解决方案

解决方案1 0 2021-11-08 22:22:46

解决方案1
0 2021-11-08 22:22:46