简体   繁体   中英

Using a CSS Selector to locate some data stored in a javascript element

I am doing some web scraping (with the ok of the site owner ) and have come across some data that is updated when a slider is moved.

Problem is that this data is inside some javascript. I am using perl Web::Scraper which allows both CSS selectors and xpath selectors but I just can't seem to be able to isolate the javascript.

I have tried attribute selectors; script[src="path_to.js"] plain node selectors 'script' and the absolute css path - which just did not want to work at all.

Any ideas how to get to the content of a script node?

Try HTML::Query

use HTML::Query ();

# get raw (unparsed) content of page into $content.
# eg: $mech->content or similar
my $content = qq|
    <html>
        <head>
            <script type="text/javascript">
                function init() {
                    var x = [1,2,3,4,5,6,7];
                    alert(x);
                }
            </script>
        </head>
        <body onload="init()">
        </body>
    </html>
|;

# This is a CSS selector  ----------------------vvvvv
my ($e) = HTML::Query::Query(text => $content, 'script'); 
die "couldn't find script element!\n" unless defined $e;

# can't use as_text or as_trimmed_text from HTML::Element
print $e->as_XML."\n"; 

Here's a Mojo::DOM example, where the 'text' selects the bits that the tag contains:

use Mojo::DOM;

my $dom = Mojo::DOM->new( $content );

say $dom
    ->find( 'script' )
    ->map( 'text' )
    ->join( "\n" );

However, it sounds as if you might be trying to get something that the JavaScript does to the DOM, in which case Perl might not be able to see it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM