How to distinguish list pattern using a regex in python

Question

I am trying to convert an XML to JSON without using python package. To do so I am converting the XML to a list which will be eventually converted to a nested dictionary and then to JSON. I am unable to distinguish the following elements while reading the XML from a list :

<Description>TestData</Description>\\n
Data</Description>\\n
<Description>Test\\n

The regex I am using to distinguish 1 and 3 are :

x = re.compile("<Description>(.+?)<\\/Description>\\n")
x = re.compile("^((?!Description).)*<\\/Description>\\\\n")

I am finding it difficult to develop a regex for the THIRD one.

x = re.compile("\\s*<Description>(.+)(?!((<\\/Description>)))\\n")

Although the second regex identifies the text 3 correctly it is also identifying the text 1 . This should identify only text 3.

Answer 1

You were very close. This regex works for what you need:

re.compile("\s*<Description>(.+)(?<!<\/Description>)\n")

I just added the '<' between the ? and ! to make a negative lookbehind assertion. Check this for more info: https://docs.python.org/2/library/re.html

Answer 2

Do you want something like this?

<Description>([^<]+)\n

Demo

python script is

 ss=""" <Description>TestData</Description>\n
  Data</Description>\n
  <Description>Test\n"""

regx= re.compile("<Description>([^<]+)\n")
capture=regx.findall(ss)
print(capture)

output is

['Test']

It seems capture[0] value is what you want..

How to distinguish list pattern using a regex in python

Question

2 answers

solution1
1 ACCPTED 2018-04-06 02:28:15

solution2
1 2018-04-06 02:40:21

How to distinguish list pattern using a regex in python

Question

2 answers

solution1 1 ACCPTED 2018-04-06 02:28:15

solution2 1 2018-04-06 02:40:21

solution1
1 ACCPTED 2018-04-06 02:28:15

solution2
1 2018-04-06 02:40:21