简体繁体中英

Parse html using C

原文 2009-10-06 20:20:03 9 5 html/ c/ regex/ parsing

I need to grab some content from an HTML (XHTML valid) page. I grab the page using curl and store it in memory.

I played with the idea of using regex with the PCRE library, but simply I couldn't find any examples using it with C. Then I moved on to look at HTML parsers and again there is not a good selection. All I could find was a skimpy documented module for libxml called HTMLparser.

Are there any alternatives? If not, then examples for what I found already?

5 answers

You want to use HTML tidy to do this. The Lib curl page has some source code to get you going. Documents traversing the dom tree. You don't need an xml parser. Doesn't fail on badly formated html.

http://curl.haxx.se/libcurl/c/htmltidy.html

I would use libhtmltidy + whatever xml parser like expat or libxml . Depends on what you're looking for.

If you want to parse XML using C, then by far the best way to proceed is to use the LibXML library. The main page is at http://xmlsoft.org/ . In addition to their downloads, they have explicit code examples that specfically show how to handle parsing . I know for a fact you can get versions precompiled for Mac and Windows, most Linux and BSD distributions have it already included, and you can build from source if you wish.

Google recently created a pure C99 library for parsing HTML, HTML5 specifically. It's easy to use in any C program and actively developed.

https://github.com/google/gumbo-parser

Fast C/C++ HTML 5 Parser. Using threads. https://github.com/lexborisov/myhtml

Parse HTML links using C#

Parse value out of HTML using C#

How to parse HTML into .txt format using C

Parse full string in Html using C#

Parse Html file using html agility pack c#

Parse HTML in c#

Parse XML into HTML using Java org.w3c.dom

C# Parse HTML String WITHOUT using HTMLAgilityPack

How do I parse HTML using regular expressions in C#?

How to Parse an input type in HTML form using C# HtmlAgilityPack

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Parse HTML links using C# Parse value out of HTML using C# How to parse HTML into .txt format using C Parse full string in Html using C# Parse Html file using html agility pack c# Parse HTML in c# Parse XML into HTML using Java org.w3c.dom C# Parse HTML String WITHOUT using HTMLAgilityPack How do I parse HTML using regular expressions in C#? How to Parse an input type in HTML form using C# HtmlAgilityPack

Related Tags

Parse html using C

Question

5 answers

solution1
13 ACCPTED 2009-10-06 20:34:50

solution2
8 2009-10-06 20:31:40

solution3
2 2009-10-06 20:30:36

solution4
2 2016-08-31 14:12:32

solution5
0 2020-07-28 16:24:55

Parse html using C

Question

5 answers

solution1 13 ACCPTED 2009-10-06 20:34:50

solution2 8 2009-10-06 20:31:40

solution3 2 2009-10-06 20:30:36

solution4 2 2016-08-31 14:12:32

solution5 0 2020-07-28 16:24:55

solution1
13 ACCPTED 2009-10-06 20:34:50

solution2
8 2009-10-06 20:31:40

solution3
2 2009-10-06 20:30:36

solution4
2 2016-08-31 14:12:32

solution5
0 2020-07-28 16:24:55