简体   繁体   中英

Using injected JavaScript to copy text from a web page

As part of a job I'm doing on a web site I have to copy a few thousand lines of text from several pages of the old site and paste them into the HTML for the new site. The long and painstaking way of going to the old page and copying the many lines of text and then going to my editor and pasting it there line by line is getting really old. I thought of using injected JavaScript to do this but I'm not quite sure where to start. Thanks in advance for any help.

Here are links to a page of the old site and a page of the new site. As you can see in the tables on each page it would take a ton of time to copy it all manually.

Old site: http://temp.delridgelegalformscom.officelive.com/macorporation1.aspx

New Site: http://ezwebsites.us/delridge/macorporation1.html

In order to do this type of work, you need two things: a way of injecting or executing your script on that page, and a good working knowledge of the Document Object Model for the target site.

I highly recommend using the Firefox plugin FireBug, or some equivalent tool on your browser of choice. FireBug lets you execute commands from a JavaScript console which will help. Hopefully the old site does not have a bunch of <FONT> , <OBJECT> or <IFRAME> tags which will make this even more tedious.

Using a library like Prototype or JQuery will also help selecting parts of the website you need. You can submit results using JQuery like this:

$(function() {
    snippet = $('#content-id').html;
    $.post('http://myserver/page', {content: snippet});
});

A problem you will very likely run into is the "same origination policy" many browsers enforce for JavaScript. So if your JavaScript was loaded from http://myserver as in this example, you would be OK.

Perhaps another route you can take is to use a scripting language like Ruby, Python, or (if you really have patience) VBA. The script can automate the list of pages to scrape and a target location for the information. It can just as easily package it up as a request to the new server if that's how pages get updated. This way you don't have to worry about injecting the JavaScript and hoping all works without problems.

我认为您需要油脂猴子http://www.greasespot.net/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM