简体   繁体   中英

How to distinguish list pattern using a regex in python

I am trying to convert an XML to JSON without using python package. To do so I am converting the XML to a list which will be eventually converted to a nested dictionary and then to JSON. I am unable to distinguish the following elements while reading the XML from a list :

  1. <Description>TestData</Description>\\n
  2. Data</Description>\\n
  3. <Description>Test\\n

The regex I am using to distinguish 1 and 3 are :

  1. x = re.compile("<Description>(.+?)<\\/Description>\\n")
  2. x = re.compile("^((?!Description).)*<\\/Description>\\\\n")

I am finding it difficult to develop a regex for the THIRD one.

  1. x = re.compile("\\s*<Description>(.+)(?!((<\\/Description>)))\\n")

Although the second regex identifies the text 3 correctly it is also identifying the text 1 . This should identify only text 3.

You were very close. This regex works for what you need:

re.compile("\s*<Description>(.+)(?<!<\/Description>)\n")

I just added the '<' between the ? and ! to make a negative lookbehind assertion. Check this for more info: https://docs.python.org/2/library/re.html

Do you want something like this?

<Description>([^<]+)\n

Demo

python script is

 ss=""" <Description>TestData</Description>\n
  Data</Description>\n
  <Description>Test\n"""

regx= re.compile("<Description>([^<]+)\n")
capture=regx.findall(ss)
print(capture)

output is

['Test']

It seems capture[0] value is what you want..

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM