简体   繁体   English

如何使用Curl提交和检索数据?

[英]how to submit and retrieve data with Curl?

I am trying Curl in PHP for the first time, the reason is I want to scrape results from this page : http://www.lldj.com/pastresult.php . 我第一次尝试在PHP中使用Curl,原因是我要从以下页面抓取结果: http : //www.lldj.com/pastresult.php This site posts weekly lotto results since 2002 and has a simple submit form ( Date ). 该网站自2002年以来每周发布乐透结果,并具有简单的提交表格(Date)。

A submit button : Name = Button / value = Submit Select drop down : Name = Draw & Options #( 1 - 1097 ) // Represent draw number 提交按钮:名称=按钮/值=提交选择下拉列表:名称=绘制和选项#(1-1097)//表示绘制编号

I can go over it manually but i thought why don't i use a simple script and make it easier as I am also interested in testing how to submit data using PHP/ CURL and retrieve results. 我可以手动检查它,但是我想为什么不使用简单的脚本并使它变得更简单,因为我也对测试如何使用PHP / CURL提交数据和检索结果感兴趣。

I have used DOM PHP for scraping and I am comfortable using the syntax . 我已经使用DOM PHP进行抓取,并且使用语法感到很舒服。 I wonder if I should use Curl and DOM together or this can be achieved with CURL. 我想知道我应该同时使用Curl和DOM还是可以通过CURL实现。

What I have so far ; 到目前为止我有什么;

include'dom.php';
$post_data['draw'] = '1097';
$post_data['button'] = 'Submit';

//traverse array and prepare data for posting (key1=value1)
foreach ( $post_data as $key => $value) {
$post_items[] = $key . '=' . $value;
}

//create the final string to be posted using implode()
$post_string = implode ('&', $post_items);

//create cURL connection
$curl_connection = 
curl_init('http://www.lldj.com/pastresult.php');

//set options
curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl_connection, CURLOPT_USERAGENT, 
curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, 1);
//set data to be posted
curl_setopt($curl_connection, CURLOPT_POSTFIELDS, $post_string);

 //perform our request
$result = curl_exec($curl_connection);

 //show information regarding the request
 print_r(curl_getinfo($curl_connection));
echo curl_errno($curl_connection) . '-' . 
            curl_error($curl_connection);

After Submitting Data / scrape 提交数据后/刮擦

$t = $curl_connection->find('table',0); // ?? usualy referes to file_get_content Var
$data = $t->find('tr');

foreach($data as $n) {
$tds = $n->find('td');

$dataRows = array();

$dataRows['num'] =  $tds[0]->find('img',0)->href;

var_dump($dataRows);
}

Can someone point on whether this is correct ? 有人可以指出这是否正确吗? How can you set to automatically increase the submit value then repeat the process ( eg, submit darw = 1 then draw =2 ect. ) Thanks 如何设置为自动增加提交值,然后重复该过程(例如,提交darw = 1然后绘制= 2等),谢谢

<?php   
  while(true){

   for($i=1;$i<5000;$i++){

$post_data['draw'] = $i; // will change every time like 1,2,3,4
$post_data['button'] = 'Submit';

//traverse array and prepare data for posting (key1=value1)
foreach ( $post_data as $key => $value) {
$post_items[] = $key . '=' . $value;
}

//create the final string to be posted using implode()
$post_string = implode ('&', $post_items);

//create cURL connection
$curl_connection = 
curl_init('http://www.lldj.com/pastresult.php');

//set options
curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl_connection, CURLOPT_USERAGENT, 
curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, 1);
//set data to be posted
curl_setopt($curl_connection, CURLOPT_POSTFIELDS, $post_string);

 //perform our request
$result = curl_exec($curl_connection);

 //show information regarding the request
 print_r(curl_getinfo($curl_connection));
echo curl_errno($curl_connection) . '-' . 
            curl_error($curl_connection);

// start your scrap //开始报废

$t = $curl_connection->find('table',0); // ?? usualy referes to file_get_content Var
$data = $t->find('tr');

foreach($data as $n) {
$tds = $n->find('td');

$dataRows = array();

$dataRows['num'] =  $tds[0]->find('img',0)->href;

var_dump($dataRows);
}

} for loop end here } for loop end here

}?>

Here just skeleton to use curl in continuously with changed id you can set it your way. 这里只是骨架,可以使用不断变化的id连续使用curl,您可以按自己的方式进行设置。

also please make sure to clear you variable after fetch data. 还请确保在获取数据后清除变量。

use like 使用像

...
curl_close($ch);
unset($fields_string);
...

Load the page 载入页面

The prefered way to grab remote content is file_get_contents() . 获取远程内容的首选方法是file_get_contents() Use: 采用:

$html = file_get_contents('http://www.lldj.com/pastresult.php');

Thats's it. 就是这样。


Get content from the page 从页面获取内容

To get content from the page you will usually use DOMDocument and DOMXPath : 要从页面获取内容,通常将使用DOMDocumentDOMXPath

$doc = new DOMDocument();
@$doc->loadHTML($html);
$selector = new DOMXpath($doc);

// xpath query
$result = $selector->query('YOUR QUERY');

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM