如何在python中使用正则表达式区分列表模式

Question

I am trying to convert an XML to JSON without using python package. 我试图不使用python包将XML转换为JSON。 To do so I am converting the XML to a list which will be eventually converted to a nested dictionary and then to JSON. 为此，我将XML转换为列表，该列表最终将转换为嵌套字典，然后转换为JSON。 I am unable to distinguish the following elements while reading the XML from a list : 从列表中读取XML时，我无法区分以下元素：

<Description>TestData</Description>\\n
Data</Description>\\n
<Description>Test\\n

The regex I am using to distinguish 1 and 3 are : 我用来区分1和3的正则表达式是：

x = re.compile("<Description>(.+?)<\\/Description>\\n")
x = re.compile("^((?!Description).)*<\\/Description>\\\\n")

I am finding it difficult to develop a regex for the THIRD one. 我发现很难为第三个正则表达式开发一个正则表达式。

x = re.compile("\\s*<Description>(.+)(?!((<\\/Description>)))\\n")

Although the second regex identifies the text 3 correctly it is also identifying the text 1 . 尽管第二个正则表达式正确标识了文本3， 但它也标识了文本1 。 This should identify only text 3. 这应该仅识别文本3。

Answer 1

You were very close. 你很亲近 This regex works for what you need: 此正则表达式可满足您的需求：

re.compile("\s*<Description>(.+)(?<!<\/Description>)\n")

I just added the '<' between the ? 我只是在？之间加了'<'。 and ! 和！ to make a negative lookbehind assertion. 在断言后面做一个否定的回顾。 Check this for more info: https://docs.python.org/2/library/re.html 检查此以获取更多信息： https : //docs.python.org/2/library/re.html

Answer 2

Do you want something like this? 你想要这样的东西吗？

<Description>([^<]+)\n

Demo 演示版

python script is python脚本是

 ss=""" <Description>TestData</Description>\n
  Data</Description>\n
  <Description>Test\n"""

regx= re.compile("<Description>([^<]+)\n")
capture=regx.findall(ss)
print(capture)

output is 输出是

['Test']

It seems capture[0] value is what you want.. 似乎您需要的是capture[0]值。

如何在python中使用正则表达式区分列表模式

问题描述

2 个解决方案

解决方案1
1 已采纳 2018-04-06 02:28:15

解决方案2
1 2018-04-06 02:40:21

如何在python中使用正则表达式区分列表模式

问题描述

2 个解决方案

解决方案1 1 已采纳 2018-04-06 02:28:15

解决方案2 1 2018-04-06 02:40:21

解决方案1
1 已采纳 2018-04-06 02:28:15

解决方案2
1 2018-04-06 02:40:21