How to extract javascript variable in HTML source code

Question

I'm trying to extract some javascript from a webpage using python. I managed to isolate the javascript (which contains the data I need) but I am having trouble targeting the js variable that has the information I need.

The javascript is formatted as shown below

The javascript is stored in python a variable named links

replace the {} in the script tag with <>

links = {script type="text/javascript"} var ADC = ADC || {}; ADC.model = {};ADC.model.search = {"count": 48, "title": "Commercial Real Estate for Sale", "h1_text": "Commercial Properties for Sale", "asset": [{"pre_auction_enabled": false, "available": true, "registration_url": "http://www.auction.com/registration/event/commercial/B-152/8024124/",....}]}

I shortened the contents of ADC.model.search but the rest of the data follows the same format. I only need the information contained in the ADC.model.search variable.

I isolate the javascript by doing:

links = source_code.find_all("script", {"type" : "text/javascript"})

where source_code is the entire sourcecode of the page I am trying to scrape

How do I extract the contents of ADC.model.search?

Answer 1

How about regex?

links goes from your code above;

import re
pattern='ADC\.model\.search=([^;\]]+?)'
match = re.match(pattern, links, re.i)  # 'links' goes from your code abouve 
print match.group(1)

How to extract javascript variable in HTML source code

Question

1 answers

solution1
0 2015-03-26 11:32:25

How about regex?

How to extract javascript variable in HTML source code

Question

1 answers

solution1 0 2015-03-26 11:32:25

How about regex?

solution1
0 2015-03-26 11:32:25