简体   繁体   中英

How do i get a variable from within a script tag using python and selenium?

I'm trying to get data for graphs and have found that the data is stored within the script tag. There are numerous other script tags before this and i'd like to access var line1 to get the dates and values. Is this possible?

This is the html:

<script type="text/javascript">
    $J(document).ready(function(){
    var line1=[["Dec 15 2013 01: +0",46,"1"],["May 26 2020 22: +0",31.883,"1"]];
    g_timePriceHistoryEarliest = new Date();
    if ( line1 != false )
    {
        g_timePriceHistoryEarliest = new Date(line1[0][0]);
        g_timePriceHistoryLatest = new Date(line1[line1.length-1][0]);
    }

    var strFormatPrefix = "$";
    var strFormatSuffix = "";

    g_plotPriceHistory = CreatePriceHistoryGraph( line1, 7, strFormatPrefix, strFormatSuffix );

    pricehistory_zoomMonthOrLifetime( g_plotPriceHistory, g_timePriceHistoryEarliest, g_timePriceHistoryLatest );
});
</script>

I've tried

script = driver.find_element_by_tag_name("script")
scriptText = driver.execute_script("return arguments[0].innerHTML", script)
print(scriptText)

but scriptText returns empty

The full xpath is

/html/body/div[1]/div[7]/div[2]/script[2]/text()

Would appreciate any help! Thanks!

Solved:

import urllib.request
import re
url=urllib.request.urlopen("yourURL")
content=url.read()
html = content.decode('utf-8')
var_re = re.compile(r'var line1=\[(.+)\]')
date_match = var_re.findall(html)
print (date_match)

Since the data you want to extract is in the HTML itself you don't need to use selenium. You can use the requests library and re library to extract it directly from the HTML. Here's the regex code for extracting the data from the sample HTML that you provided. It returns a list of str with the dates and values you seem to want.

Since you didn't provide the URL you'll need to code the requests portion yourself.

html = """<script type="text/javascript">
$J(document).ready(function(){
var line1=[["Dec 15 2013 01: +0",46,"1"],["May 26 2020 22: +0",31.883,"1"]];
g_timePriceHistoryEarliest = new Date();
if ( line1 != false )
{
    g_timePriceHistoryEarliest = new Date(line1[0][0]);
    g_timePriceHistoryLatest = new Date(line1[line1.length-1][0]);
}

var strFormatPrefix = "$";
var strFormatSuffix = "";

g_plotPriceHistory = CreatePriceHistoryGraph( line1, 7, strFormatPrefix, strFormatSuffix );

pricehistory_zoomMonthOrLifetime( g_plotPriceHistory, g_timePriceHistoryEarliest, g_timePriceHistoryLatest );
});</script>"""

var_re = re.compile(r'var line1=\[(.+)\]')
date_match = var_re.findall(html)

print(date_match)

Output:
['["Dec 15 2013 01: +0",46,"1"],["May 26 2020 22: +0",31.883,"1"]']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM