简体   繁体   English

提取网页的源代码

[英]extracting source code of a webpage

Hi I wanted to extract the source code of an external website (not on my domain) and then parse it to make it an application. 嗨,我想提取外部网站的源代码(不在我的域上),然后解析它使其成为一个应用程序。 I know how to do it using Jsoup library for JAVA, but I was not able to find any links on how to do it using Javascript or Jquery or any client side web programming language. 我知道如何使用Jsoup库为JAVA,但我无法找到任何关于如何使用Javascript或Jquery或任何客户端Web编程语言的链接。 Can someone guide me on which library should I use. 有人可以指导我应该使用哪个库。 Basically, I want to get the HTML source code of a webpage and then parse it to extract certain links under certain tags. 基本上,我想获取网页的HTML源代码,然后解析它以提取某些标签下的某些链接。

You will not be able to do this with JavaScript alone because of same origin policy . 由于原始策略相同,您无法单独使用JavaScript。 That prevents you from reading information from other domains. 这会阻止您从其他域读取信息。

What you would have to do is use a serverside proxy to fetch the information. 您需要做的是使用服务器端代理来获取信息。 An Ajax call can call the proxy to fetch the page. Ajax调用可以调用代理来获取页面。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM