简体   繁体   中英

Scrape/Retrieve Data from Data Grid - DOM to CSV console output

I want to scrape the Cell Widgets -> Data Grid on this page: http://samples.gwtproject.org/samples/Showcase/Showcase.html#!CwDataGrid

Ideally I am looking for csv style string output like (first line and last line example)

;Corey;Jenkins;63;Coworkers;438 Techwood St;
.... (many rows here)
;Yvonne;Morris;55;Family; 483 Third Pkwy;

(I am working with Firefox)

Not sure whether you are trying to do this to build a site scraper on gwt website. In the above example each of the row has a identifying attribute across each of the grid rows represented by TR tags. For the first TR tag you will have __gwt_row="0" __gwt_subrow="0" .

You will also have cell attributes of type __gwt_cell="cell-gwt-uid-29"

The above two attributes on row and cell should allow you to easily do a XPATH lookup and iteration to scrape the page and output into a csv file.

var jq = document.createElement('script');

jq.onload = function() {
    jQuery.noConflict();

    // Our stuff...
    (function ($) {
        $('table').find('tr')
            .each(function(j, rowitem) {
                var line = ''
                $(rowitem).find('div').each(function(i, item) {
                    var o = $(item).find('option[selected]');
                    if (o.length > 0) {
                        line += $(o).text();
                    } else {
                        line += $(item).text();
                    }
                    line += ';';
                });
                console.log(line);
            });
    })(jQuery);
}

// Load jQuery as per http://stackoverflow.com/a/7474386/22972
jq.src = "http://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js";
document.getElementsByTagName('head')[0].appendChild(jq);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM