I have some data in xml coming in through alexa. It looks like this:
<!--
Need more Alexa data? Find our APIs here: https://aws.amazon.com/alexa/
-->
<ALEXA VER="0.9" URL="yahoo.com/" HOME="0" AID="=" IDN="yahoo.com/">
<SD>
<POPULARITY URL="yahoo.com/" TEXT="5" SOURCE="panel"/>
<REACH RANK="5"/>
<RANK DELTA="+0"/>
<COUNTRY CODE="US" NAME="United States" RANK="5"/>
</SD>
</ALEXA>
Here's the link to it: http://data.alexa.com/data?cli=10&url=https://www.yahoo.com/
I want to either grab the "REACH RANK" number by parsing through it, or turn the data into JSON and than query it. Does anyone know how I can do either one?
So there is no such thing as a one to one mapping tool which will automatically turn your xml to JSON. The best bet would be to parse XML using Python's built in abilities like https://docs.python.org/2/library/xml.html or you could try to use LXML. There is always the good ole fashioned regular expression route and finally you could use a library like BeautifulSoup to help parse your XML.
As far as turning it to JSON is concerned you would want to build your data into a Python dictionary and use the json library.
import json
my_data = json.loads(dict_data)
If all you want is the RANK attribute from the REACH tag, you can use the builtin xml.etree , it is just a matter of finding the REACH tag and extracting the attribute with .get
:
In [19]: x = """<ALEXA VER="0.9" URL="yahoo.com/" HOME="0" AID="=" IDN="yahoo.com/">
....: <SD>
....: <POPULARITY URL="yahoo.com/" TEXT="5" SOURCE="panel"/>
....: <REACH RANK="5"/>
....: <RANK DELTA="+0"/>
....: <COUNTRY CODE="US" NAME="United States" RANK="5"/>
....: </SD>
....: </ALEXA>"""
In [20]: from xml.etree import ElementTree as et
In [21]: tree = et.fromstring(x)
In [22]: rank = tree.find(".//REACH").get("RANK")
In [23]: rank
Out[23]: '5'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.