简体   繁体   English

如何在python中使用正则表达式区分列表模式

[英]How to distinguish list pattern using a regex in python

I am trying to convert an XML to JSON without using python package. 我试图不使用python包将XML转换为JSON。 To do so I am converting the XML to a list which will be eventually converted to a nested dictionary and then to JSON. 为此,我将XML转换为列表,该列表最终将转换为嵌套字典,然后转换为JSON。 I am unable to distinguish the following elements while reading the XML from a list : 从列表中读取XML时,我无法区分以下元素:

  1. <Description>TestData</Description>\\n
  2. Data</Description>\\n
  3. <Description>Test\\n

The regex I am using to distinguish 1 and 3 are : 我用来区分1和3的正则表达式是:

  1. x = re.compile("<Description>(.+?)<\\/Description>\\n")
  2. x = re.compile("^((?!Description).)*<\\/Description>\\\\n")

I am finding it difficult to develop a regex for the THIRD one. 我发现很难为第三个正则表达式开发一个正则表达式。

  1. x = re.compile("\\s*<Description>(.+)(?!((<\\/Description>)))\\n")

Although the second regex identifies the text 3 correctly it is also identifying the text 1 . 尽管第二个正则表达式正确标识了文本3, 但它也标识了文本1 This should identify only text 3. 这应该仅识别文本3。

You were very close. 你很亲近 This regex works for what you need: 此正则表达式可满足您的需求:

re.compile("\s*<Description>(.+)(?<!<\/Description>)\n")

I just added the '<' between the ? 我只是在?之间加了'<'。 and ! 和! to make a negative lookbehind assertion. 在断言后面做一个否定的回顾。 Check this for more info: https://docs.python.org/2/library/re.html 检查此以获取更多信息: https : //docs.python.org/2/library/re.html

Do you want something like this? 你想要这样的东西吗?

<Description>([^<]+)\n

Demo 演示版

python script is python脚本是

 ss=""" <Description>TestData</Description>\n
  Data</Description>\n
  <Description>Test\n"""

regx= re.compile("<Description>([^<]+)\n")
capture=regx.findall(ss)
print(capture)

output is 输出是

['Test']

It seems capture[0] value is what you want.. 似乎您需要的是capture[0]值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM