简体   繁体   English

如何使用cURL(C / C ++)从网页中获取特定行数

[英]How to fetch specific number of lines from webpage using cURL (C/C++)

I am a newbie in cURL and trying to implement some application, which could allow user to fetch specific data from an HTML page (dynamic) and save it to .txt 我是cURL的新手,正在尝试实现一些应用程序,该应用程序可以允许用户从HTML页面(动态)获取特定数据并将其保存到.txt。

Application is c/c++ based and so far i am able to fetch the whole contant of HTML page. 应用程序基于c / c ++,到目前为止,我能够提取HTML页面的整个内容。

This is the code i am refering:- 这是我指的代码:

#include "stdafx.h" 
#pragma comment(lib, "curllib_static.lib") 
#include "curl/curl.h" 
#pragma comment(lib, "wldap32.lib") 
#pragma comment(lib, "ws2_32.lib") 
#pragma comment(lib, "winmm.lib")
#pragma comment(lib, "ssleay32.lib") 
#pragma comment(lib, "openldap.lib") 
#pragma comment(lib, "libeay32.lib")

void get_page(const char* url, const char* file_name)
{
  CURL* easyhandle = curl_easy_init();
 // time = 100;
  curl_easy_setopt( easyhandle, CURLOPT_URL, url ) ;

  curl_easy_setopt (easyhandle, CURLOPT_CONNECTTIMEOUT, .29);

  FILE* file = fopen( "my.txt", "a+");

  curl_easy_setopt( easyhandle, CURLOPT_WRITEDATA, file) ;
//  curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);

  curl_easy_perform( easyhandle );

  curl_easy_cleanup( easyhandle );
  fclose(file);

}

int main()
{
  get_page( "http:couldbeanything.com", "style.css" ) ;

  return 0;
}

So, this code fetches whole page and i just want to fetch some specific number of lines using it (for example - 5) 因此,此代码将提取整个页面,而我只想使用它来提取特定数量的行(例如-5)

I searched and came across something called "PHP dom parser" and is there some way to implement this fetching in C/C++ ?? 我搜索并遇到了一个叫做“ PHP dom parser”的东西,有什么方法可以在C / C ++中实现这种获取?

Thanks in advance 提前致谢

It's an unusual requirement and no DOM parser is going to help you. 这是一个不寻常的要求,并且没有DOM解析器会为您提供帮助。 Instead you should use some slightly more advanced curl options, instead of using CURLOPT_WRITEDATA use CURLOPT_WRITEFUNCTION . 相反,您应该使用一些稍微高级的curl选项,而不是使用CURLOPT_WRITEDATA ,而使用CURLOPT_WRITEFUNCTION Like this 像这样

curl_easy_setopt(easyhandle, CURLOPT_WRITEFUNCTION, my_function);

...

size_t my_function(char *ptr, size_t size, size_t nmemb, void *userdata)
{
    ...
}

my_function is a callback, when some data is available from the webpage then my_function will be called by curl with ptr pointing to the available data, size telling you how many data elements are available, and nmemb telling you the size of each data element. my_function是一个回调,当网页上有某些数据可用时,curl将调用my_function ,其中ptr指向可用数据, size告诉您有多少个数据元素, nmemb告诉您每个数据元素的大小。 You can then do what you want with this data. 然后,您可以使用此数据执行所需的操作。 Presumably in your case this would mean extracting the first few lines. 大概在您的情况下,这意味着要提取前几行。

But there is no guarantee that you will get the first few lines in a nice convenient block (the internet doesn't work like that) so you have some work to do. 但是不能保证您会在一个方便的区域中获得前几行(互联网无法正常工作),因此您需要做一些工作。 Check the docs for more information. 检查文档以获取更多信息。

Basically since the internet isn't 'line based' there's no simple way to do what you want and I wonder if you should rethink your requirements. 基本上,由于互联网不是基于“行”的,因此没有简单的方法可以做您想要的事情,我想知道您是否应该重新考虑自己的要求。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM