I am trying to convert an XML to JSON without using python package. To do so I am converting the XML to a list which will be eventually converted to a nested dictionary and then to JSON. I am unable to distinguish the following elements while reading the XML from a list :
<Description>TestData</Description>\\n
Data</Description>\\n
<Description>Test\\n
The regex I am using to distinguish 1 and 3 are :
x = re.compile("<Description>(.+?)<\\/Description>\\n")
x = re.compile("^((?!Description).)*<\\/Description>\\\\n")
I am finding it difficult to develop a regex for the THIRD one.
x = re.compile("\\s*<Description>(.+)(?!((<\\/Description>)))\\n")
Although the second regex identifies the text 3 correctly it is also identifying the text 1 . This should identify only text 3.
You were very close. This regex works for what you need:
re.compile("\s*<Description>(.+)(?<!<\/Description>)\n")
I just added the '<' between the ? and ! to make a negative lookbehind assertion. Check this for more info: https://docs.python.org/2/library/re.html
Do you want something like this?
<Description>([^<]+)\n
python script is
ss=""" <Description>TestData</Description>\n
Data</Description>\n
<Description>Test\n"""
regx= re.compile("<Description>([^<]+)\n")
capture=regx.findall(ss)
print(capture)
output is
['Test']
It seems capture[0]
value is what you want..
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.