简体   繁体   English

使用注入的JavaScript从网页复制文本

[英]Using injected JavaScript to copy text from a web page

As part of a job I'm doing on a web site I have to copy a few thousand lines of text from several pages of the old site and paste them into the HTML for the new site. 作为我在网站上所做的工作的一部分,我必须从旧网站的几页中复制几千行文本,并将其粘贴到新网站的HTML中。 The long and painstaking way of going to the old page and copying the many lines of text and then going to my editor and pasting it there line by line is getting really old. 转到旧页面并复制多行文本,然后转到我的编辑器并将其逐行粘贴的漫长而费力的方法已经变得很老了。 I thought of using injected JavaScript to do this but I'm not quite sure where to start. 我曾考虑使用注入式JavaScript来做到这一点,但我不确定从哪里开始。 Thanks in advance for any help. 在此先感谢您的帮助。

Here are links to a page of the old site and a page of the new site. 这是旧站点页面和新站点页面的链接。 As you can see in the tables on each page it would take a ton of time to copy it all manually. 如您在每页的表格中所见,手动复制全部内容将花费大量时间。

Old site: http://temp.delridgelegalformscom.officelive.com/macorporation1.aspx 旧网站: http//temp.delridgelegalformscom.officelive.com/macorporation1.aspx

New Site: http://ezwebsites.us/delridge/macorporation1.html 新站点: http : //ezwebsites.us/delridge/macorporation1.html

In order to do this type of work, you need two things: a way of injecting or executing your script on that page, and a good working knowledge of the Document Object Model for the target site. 为了完成这种工作,您需要做两件事:在该页面上注入或执行脚本的方法,以及对目标站点的文档对象模型的良好使用知识。

I highly recommend using the Firefox plugin FireBug, or some equivalent tool on your browser of choice. 我强烈建议您在选择的浏览器中使用Firefox插件FireBug或某些等效工具。 FireBug lets you execute commands from a JavaScript console which will help. FireBug可让您从JavaScript控制台执行命令,这将有所帮助。 Hopefully the old site does not have a bunch of <FONT> , <OBJECT> or <IFRAME> tags which will make this even more tedious. 希望旧站点没有一堆<FONT><OBJECT><IFRAME>标记,这会使它变得更加乏味。

Using a library like Prototype or JQuery will also help selecting parts of the website you need. 使用Prototype或JQuery之类的库也将帮助选择您需要的网站部分。 You can submit results using JQuery like this: 您可以像这样使用JQuery提交结果:

$(function() {
    snippet = $('#content-id').html;
    $.post('http://myserver/page', {content: snippet});
});

A problem you will very likely run into is the "same origination policy" many browsers enforce for JavaScript. 您很可能会遇到的一个问题是许多浏览器为JavaScript强制执行的“相同原始策略”。 So if your JavaScript was loaded from http://myserver as in this example, you would be OK. 因此,如果在本例中从http://myserver加载JavaScript,就可以了。

Perhaps another route you can take is to use a scripting language like Ruby, Python, or (if you really have patience) VBA. 也许您可以采取的另一种方法是使用脚本语言,例如Ruby,Python或(如果您真的有耐心的话)VBA。 The script can automate the list of pages to scrape and a target location for the information. 该脚本可以自动执行要抓取的页面列表以及该信息的目标位置。 It can just as easily package it up as a request to the new server if that's how pages get updated. 如果页面是通过这种方式更新的,它可以轻松地打包为对新服务器的请求。 This way you don't have to worry about injecting the JavaScript and hoping all works without problems. 这样,您不必担心注入JavaScript并希望所有工作都不会出现问题。

我认为您需要油脂猴子http://www.greasespot.net/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM