简体   繁体   中英

Looking for a way to scrape HTML with JS

As the title suggests, I'm looking for a hopefully straightforward way of scraping all of the HTML from a webpage. Storing it in a string perhaps, and then navigating through that string to pull out the desired element.

Specifically, I want to scrape my twitter page and display my profile picture inside a new div. I know there are several tools for doing just this, but I would anyone have some code examples or suggestions for how I might do this myself?

Thanks a lot

UPDATE

After a very helpful response from TJ Crowder I did some more searching online and found this resource .

In theory, this is easy. You just do an ajax call to get the text of the page, then use jQuery to turn that into a disconnected DOM, and then use all the usual jQuery tools to find and extract what you need.

$.ajax({
    url:     "http://example.com/some/path",
    success: function(html) {
        var tree = $(html);
        var imgsrc = tree.find("img.some-class").attr("src");
        if (imgsrc) {
            // ...add the image to your page
        }
    }
});

But (and it's a big one) it's not likely to work, because of the Same Origin Policy , which prevents cross-origin ajax calls. Certain individual sites may have an open CORS policy, but most won't, and of course supporting CORS on IE8 and IE9 requires an extra jQuery plug-in .

So to do this with sites that don't allow your origin via CORS, there must be a server involved. It can be your server and you can grab the text of the page you want using server-side code and then send it to your page via ajax (or just build the bits you want into your page when you first render it). All of the usual server-side stacks (PHP, Node, ASP.Net, JVM, ...) have the ability to grab web pages. Or, in some cases, you may be able to use YQL as a cross-domain proxy , using their server rather than your own.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM