简体   繁体   中英

Python - Parse XML or conver to JSON Alexa API Data

I have some data in xml coming in through alexa. It looks like this:

<!--
 Need more Alexa data?  Find our APIs here: https://aws.amazon.com/alexa/ 
-->
<ALEXA VER="0.9" URL="yahoo.com/" HOME="0" AID="=" IDN="yahoo.com/">
<SD>
<POPULARITY URL="yahoo.com/" TEXT="5" SOURCE="panel"/>
<REACH RANK="5"/>
<RANK DELTA="+0"/>
<COUNTRY CODE="US" NAME="United States" RANK="5"/>
</SD>
</ALEXA>

Here's the link to it: http://data.alexa.com/data?cli=10&url=https://www.yahoo.com/

I want to either grab the "REACH RANK" number by parsing through it, or turn the data into JSON and than query it. Does anyone know how I can do either one?

So there is no such thing as a one to one mapping tool which will automatically turn your xml to JSON. The best bet would be to parse XML using Python's built in abilities like https://docs.python.org/2/library/xml.html or you could try to use LXML. There is always the good ole fashioned regular expression route and finally you could use a library like BeautifulSoup to help parse your XML.

As far as turning it to JSON is concerned you would want to build your data into a Python dictionary and use the json library.

import json
my_data = json.loads(dict_data)

If all you want is the RANK attribute from the REACH tag, you can use the builtin xml.etree , it is just a matter of finding the REACH tag and extracting the attribute with .get :

In [19]:  x = """<ALEXA VER="0.9" URL="yahoo.com/" HOME="0" AID="=" IDN="yahoo.com/">
   ....:     <SD>
   ....:     <POPULARITY URL="yahoo.com/" TEXT="5" SOURCE="panel"/>
   ....:     <REACH RANK="5"/>
   ....:     <RANK DELTA="+0"/>
   ....:     <COUNTRY CODE="US" NAME="United States" RANK="5"/>
   ....:     </SD>
   ....:     </ALEXA>"""



In [20]: from xml.etree import ElementTree as et

In [21]: tree = et.fromstring(x)

In [22]: rank = tree.find(".//REACH").get("RANK")

In [23]: rank
Out[23]: '5'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM