简体   繁体   English

在php中获取页面的源代码

[英]Get the source code of a page in php

first of all thank you for your next response. 首先感谢您的下一个答复。

I can not get the source code of a page (to extract the contents) of 我无法获取页面的源代码(以提取内容)

http://steamcommunity.com/market/search?q=booster#p2 (-->$path) http://steamcommunity.com/market/search?q=booster#p2 (-> $ path)

here is my first source code: 这是我的第一个源代码:

$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $path);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt ($ch, CURLOPT_USERAGENT, 'Mozilla/5.0');
$file_contents = curl_exec($ch);
curl_close($ch);
$file_contents =  htmlentities($file_contents);
print_r($file_contents);

here a second trial : 这是第二次审判:

$fp=null;
$fp=@fopen($path,"r");
$contenu = "";
if($fp){
 while(!feof($fp)){
 $contenu .=  stream_get_line($fp,65535);
 }
 print_r($contenu);
}
else{
 echo "Impossible d'ouvrir la page $path";
}

with this code I get the source code of this page : http://steamcommunity.com/market/search?q=booster or this page ..../market/search?q=booster#p1 使用此代码,我得到此页面的源代码: http : //steamcommunity.com/market/search ?q=booster或此页面.... / market / search?q = booster#p1

I said that the source code displayed by firefox is not good and only dom inspector allows me to see the "real" source code. 我说过,firefox显示的源代码不好,只有dom inspector允许我看到“真实的”源代码。 Do you have a solution? 你有解决方案吗?

You won't be able to do this using PHP. 您将无法使用PHP执行此操作。 You need to execute the page's javascript to get the rendered DOM. 您需要执行页面的javascript以获取呈现的DOM。 (The rendered DOM is what you're seeing when you use the DOM inspector.) (渲染的DOM是使用DOM检查器时看到的。)

Maybe use PhantomJS to open the page and get the rendered DOM. 也许使用PhantomJS打开页面并获取渲染的DOM。 See Using Phantom.js evaluate, how can I get the HTML of the page? 请参阅使用Phantom.js评估,如何获取页面的HTML? .

I said that the source code displayed by firefox is not good and only dom inspector allows me to see the "real" source code. 我说过,firefox显示的源代码不好,只有dom inspector允许我看到“真实的”源代码。 Do you have a solution? 你有解决方案吗?

That's completely backwards. 那完全是倒退。 The DOM inspector shows you the current state of the page, as modified by Javascript and/or the user (eg, form state changes). DOM检查器向您显示页面的当前状态,该状态由Javascript和/或用户修改(例如,表单状态更改)。 The source code as displayed by Firefox's "View Source" is the "real" source code as delivered by the web server. Firefox的“查看源代码”显示的源代码是Web服务器提供的“真实”源代码。

You're hitting the wrong URL. 您输入的网址错误。 Instead, hit the AJAX query one inside it and parse it as JSON: 相反,请在其中的AJAX查询中打一个并将其解析为JSON:

$f = file_get_contents(
    "http://steamcommunity.com/market/search/render/?" .
    "query=booster&start=10&count=10"
);
$t = json_decode( $f );
print_r( $t );

And you get a neatly organized structure, such as: 您会得到一个整齐有序的结构,例如:

stdClass Object (
    [success] => 1
    [start] => 0
    [pagesize] => 10
    [total_count] => 330
    [results_html] => <div class="market_listing_table_header">
    ...

Essentially the JSON file that's used to render the page can be read as a neat structure in PHP. 本质上,用于呈现页面的JSON文件可以在PHP中阅读为整洁的结构。 Or close enough. 或足够接近。 You'll still need to walk through $t->results_html with DOM Document / XPath for further parsing. 您仍然需要使用DOM Document / XPath遍历$t->results_html进行进一步的解析。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM