简体   繁体   English

如何从网页中获取源代码?

[英]How do I get source code from a webpage?

我们如何从 php 和/或 javascript 中的网页获取网页的源代码?

Thanks to:谢谢:

First, you must know that you will never be able to get the source code of a page that is not on the same domain as your page in javascript.首先,您必须知道您将永远无法在 javascript 中获取与您的页面不在同一域中的页面的源代码。 (See http://en.wikipedia.org/wiki/Same_origin_policy ). (参见http://en.wikipedia.org/wiki/Same_origin_policy )。

In PHP, this is how you do it :在 PHP 中,您可以这样做:

file_get_contents($theUrl);

In javascript, there is three ways :在javascript中,有三种方式:

Firstly, by XMLHttpRequest : http://jsfiddle.net/635YY/1/首先,通过 XMLHttpRequest : http : //jsfiddle.net/635YY/1/

var url="../635YY",xmlhttp;//Remember, same domain
if("XMLHttpRequest" in window)xmlhttp=new XMLHttpRequest();
if("ActiveXObject" in window)xmlhttp=new ActiveXObject("Msxml2.XMLHTTP");
xmlhttp.open('GET',url,true);
xmlhttp.onreadystatechange=function()
{
    if(xmlhttp.readyState==4)alert(xmlhttp.responseText);
};
xmlhttp.send(null);

Secondly, by iFrames : http://jsfiddle.net/XYjuX/1/其次,通过 iFrames: http : //jsfiddle.net/XYjuX/1/

var url="../XYjuX";//Remember, same domain
var iframe=document.createElement("iframe");
iframe.onload=function()
{
    alert(iframe.contentWindow.document.body.innerHTML);
}
iframe.src=url;
iframe.style.display="none";
document.body.appendChild(iframe);

Thirdly, by jQuery : http://jsfiddle.net/edggD/2/第三,通过 jQuery: http : //jsfiddle.net/edggD/2/

$.get('../edggD',function(data)//Remember, same domain
{
    alert(data);
});

在 Javascript 中不使用不必要的框架(在示例中 api.codetabs.com 是绕过跨源资源共享的代理):

fetch('https://api.codetabs.com/v1/proxy?quest=google.com').then((response) => response.text()).then((text) => console.log(text));

Ajax example using jQuery:使用 jQuery 的 Ajax 示例:

// Display the source code of a web page in a pre tag (escaping the HTML).
// Only works if the page is on the same domain.

$.get('page.html', function(data) {
    $('pre').text(data);
});

If you just want access to the source code, the data parameter in the above code contains the raw HTML source code.如果您只想访问源代码,则上述代码中的 data 参数包含原始 HTML 源代码。

Following Google's guide on fetch() and using the D.Snap answer, you would have something like this:按照谷歌关于 fetch() 的指南并使用 D.Snap 答案,你会得到这样的东西:

fetch('https://api.codetabs.com/v1/proxy?quest=URL_you_want_to_fetch')
  .then(
    function(response) {
      if (response.status !== 200) {
        console.log('Looks like there was a problem. Status Code: ' +
          response.status);
        return;
      }

      // Examine the text in the response
      response.text().then(function(data) {
        // data contains all the plain html of the url you previously set, 
        // you can use it as you want, it is typeof string
        console.log(data)
      });
    }
  )
  .catch(function(err) {
    console.log('Fetch Error :-S', err);
  });

This way you are using a CORS Proxy, in this example it is Codetabs CORS Proxy .这样您就可以使用 CORS 代理,在本例中它是Codetabs CORS 代理

A CORS Proxy allows you to fetch resources that are not in your same domain, thus avoiding the Same-Origin policies blocking your requests. CORS 代理允许您获取不在同一域中的资源,从而避免同源策略阻止您的请求。 You can take a look at other CORS Proxys:您可以查看其他 CORS 代理:

https://nordicapis.com/10-free-to-use-cors-proxies/ https://nordicapis.com/10-free-to-use-cors-proxies/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM