简体   繁体   English

如何使用libcurl逐行解析网页?

[英]How can I use libcurl to parse through a webpage line-by-line?

Ok, so I'm building this program in C for a Linux system. 好的,所以我正在用C为Linux系统构建该程序。 I need to be able to retrieve the content of a URL, and then read it line-by-line so I can do my own custom parsing on it. 我需要能够检索URL的内容,然后逐行读取它,以便可以对它进行自己的自定义解析。

Now, what's very important to me is speed, meaning I'd really like to do this without saving the entire thing to a file, then reading the file (since, for example, there may be content on the first line of the file that means I don't need to read the rest of it). 现在,对我来说非常重要的是速度,这意味着我真的很想在不将整个内容保存到文件然后读取文件的情况下进行此操作(例如,因为文件的第一行可能包含内容表示我不需要阅读其余内容)。

Also very important is that it is thread-safe. 同样重要的是它是线程安全的。 I tried using the code here: http://curl.haxx.se/libcurl/c/fopen.html but it uses global variables that make it impossible to safely multithread. 我尝试在此处使用以下代码: http : //curl.haxx.se/libcurl/c/fopen.html,但是它使用的全局变量使得无法安全地进行多线程。

Any ideas? 有任何想法吗?

Examples are just that: examples. 例子就是:例子。 If they work slightly wrong, then fix it to work better. 如果它们的工作略有错误,请对其进行修复以使其更好。

I would guess that you're better off starting with another example, perhaps this getinemory.c: 我想您最好从另一个示例开始,也许是这个getinemory.c:

http://curl.haxx.se/libcurl/c/getinmemory.html http://curl.haxx.se/libcurl/c/getinmemory.html

libcurl delivers data "chunk by chunk" and not line by line, so your application needs to figure out when you have enough data and you can then tell libcurl to stop transferring. libcurl传递数据是“逐块”而不是逐行的,因此您的应用程序需要确定何时有足够的数据,然后可以告诉libcurl停止传输。

If you just want to retrieve the data for a page, it's fairly easy to use the socket API directly. 如果只想检索页面数据,则直接使用套接字API相当容易。 There are also quite a few libraries around that make it a bit easier still. 周围还有很多库使它变得更容易一些。 Unfortunately, you haven't said what system you want this for so it's hard to recommend which library you probably want (Windows demands a bit of special code to startup/shut down Winsock that isn't necessary and won't compile or link on almost any other system). 不幸的是,您没有说出您想要的系统,因此很难推荐您可能想要的库(Windows需要一些特殊的代码来启动/关闭Winsock,这是不必要的,并且不会编译或链接到Winsock。几乎所有其他系统)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM