简体   繁体   中英

Extract/decode Javascript variables from html into python

I'm trying to extract, with python, some javascript variables from an HTML site:

<script>
var nData = new Array();
var Data = "5b7b......";
nData = CallInit(Data);
...
...
</script>

I can see the content of "nData" in firebug (DOM Panel) without problem:

[Object { height="532",  width="1280",  url="https://example.org...8EDA4F3F5F395B9&key=lh1",  more...}, Object { height="266",  width="640",  url="https://example.org...8EDA4F3F5F395B9&key=lh1",  more...}]

The content of nData is an URL. How can i parse/extract the content of nData to python? It's possible?

Thanks

With the help of the python library Ghost.py it should be possible to get a dynamic variable out of executed Javascript code.

I just tried it out with some small test site and got a Javascript variable named a which I use on that page as a python object. I did the following:

  1. Install Ghost.py with pip install Ghost.py .

  2. Install PySide (it's a prerequisite for Ghost.py) with pip install PySide .

  3. Use the following python code:

     from ghost import Ghost ghost = Ghost() ghost.open('https://dl.dropboxusercontent.com/u/13991899/test/index.html') js_variable, _ = ghost.evaluate('a', expect_loading=True) print js_variable 

You should be able to get your variable nData into the python variable js_variable by opening your site with ghost.open and then call ghost.evaluate('nData') .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM