I'm trying to get data for graphs and have found that the data is stored within the script tag. There are numerous other script tags before this and i'd like to access var line1 to get the dates and values. Is this possible?
This is the html:
<script type="text/javascript">
$J(document).ready(function(){
var line1=[["Dec 15 2013 01: +0",46,"1"],["May 26 2020 22: +0",31.883,"1"]];
g_timePriceHistoryEarliest = new Date();
if ( line1 != false )
{
g_timePriceHistoryEarliest = new Date(line1[0][0]);
g_timePriceHistoryLatest = new Date(line1[line1.length-1][0]);
}
var strFormatPrefix = "$";
var strFormatSuffix = "";
g_plotPriceHistory = CreatePriceHistoryGraph( line1, 7, strFormatPrefix, strFormatSuffix );
pricehistory_zoomMonthOrLifetime( g_plotPriceHistory, g_timePriceHistoryEarliest, g_timePriceHistoryLatest );
});
</script>
I've tried
script = driver.find_element_by_tag_name("script")
scriptText = driver.execute_script("return arguments[0].innerHTML", script)
print(scriptText)
but scriptText returns empty
The full xpath is
/html/body/div[1]/div[7]/div[2]/script[2]/text()
Would appreciate any help! Thanks!
Solved:
import urllib.request
import re
url=urllib.request.urlopen("yourURL")
content=url.read()
html = content.decode('utf-8')
var_re = re.compile(r'var line1=\[(.+)\]')
date_match = var_re.findall(html)
print (date_match)
Since the data you want to extract is in the HTML itself you don't need to use selenium. You can use the requests
library and re
library to extract it directly from the HTML. Here's the regex code for extracting the data from the sample HTML that you provided. It returns a list of str
with the dates and values you seem to want.
Since you didn't provide the URL you'll need to code the requests
portion yourself.
html = """<script type="text/javascript">
$J(document).ready(function(){
var line1=[["Dec 15 2013 01: +0",46,"1"],["May 26 2020 22: +0",31.883,"1"]];
g_timePriceHistoryEarliest = new Date();
if ( line1 != false )
{
g_timePriceHistoryEarliest = new Date(line1[0][0]);
g_timePriceHistoryLatest = new Date(line1[line1.length-1][0]);
}
var strFormatPrefix = "$";
var strFormatSuffix = "";
g_plotPriceHistory = CreatePriceHistoryGraph( line1, 7, strFormatPrefix, strFormatSuffix );
pricehistory_zoomMonthOrLifetime( g_plotPriceHistory, g_timePriceHistoryEarliest, g_timePriceHistoryLatest );
});</script>"""
var_re = re.compile(r'var line1=\[(.+)\]')
date_match = var_re.findall(html)
print(date_match)
Output:
['["Dec 15 2013 01: +0",46,"1"],["May 26 2020 22: +0",31.883,"1"]']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.