简体繁体 English

用于 http 标头解析的 C 正则表达式库

[英]C regex lib for http header parsing

原文 2013-02-19 04:53:49 6 2 c/ regex

I have a program that has a buffer containing http data captured from the wire.我有一个程序，它有一个缓冲区，其中包含从网络中捕获的 http 数据。 The buffer would contain both http header and html.缓冲区将包含 http 标头和 html。 Using C program is there a way to parse the http header?使用 C 程序有没有办法解析 http 标头？ Iam not really interested in html.我对 html 不是很感兴趣。 I have seen other examples as shown in Regex HTTP header parsing , however, Iam looking at using some existing library (to be used in C) that can simply parse the header and give me each field.我已经看到了其他示例，如Regex HTTP header parsing 所示，但是，我正在考虑使用一些现有的库（在 C 中使用），它可以简单地解析标头并给我每个字段。

My requirements are: - To just peep into the buffer and check if its http payload - If its http payload then run a regex parser to get all fields of http header.我的要求是： - 只是窥视缓冲区并检查它的 http 有效负载 - 如果它的 http 有效负载然后运行正则表达式解析器来获取 http 标头的所有字段。

Is there code out there which I can check?那里有我可以检查的代码吗？ Does anyone know of any library?有谁知道任何图书馆？

Regards, bgun问候， bgun

2 个解决方案

Library http-parser should serve you well.图书馆http-parser应该可以很好地为您服务。

If you want to parse some simple regexes, I would recommend very small and robust C regex parser SLRE - Super Light Regular Expression library.如果你想解析一些简单的正则表达式，我会推荐非常小而强大的 C 正则表达式解析器SLRE - Super Light 正则表达式库。 It consists of only one header file and one source file written in standard C, which you can link to your project.它只包含一个头文件和一个用标准 C 编写的源文件，您可以将它们链接到您的项目。

It supports quite usable subset of standard regular expressions:它支持非常有用的标准正则表达式子集：

\\d , \\w , \\s , \\S (non-whitespace), * (match 0 or more), + (match 1 or more), () for groups. \\d , \\w , \\s , \\S (非空白), * (匹配0个或更多), + (匹配1个或更多), ()用于组。 It don't think it supports nested groups, but I always was able to get by without them.它不认为它支持嵌套组，但我总是能够在没有它们的情况下度过难关。

Well if it's an http payload the first 5 characters should be "HTTP/".好吧，如果它是一个 http 负载，前 5 个字符应该是“HTTP/”。 If that's not the beginning of the response then you can assume it's not an http response.如果这不是响应的开始，那么您可以假设它不是 http 响应。 If it is and all you care about are the headers, then you simply need to continue receiving data until the first "\\r\\n\\r\\n".如果是这样并且您只关心标题，那么您只需要继续接收数据，直到第一个“\\r\\n\\r\\n”。 From there if you must separate the header name from the values, it's as simple as using the first colon on each line as the delimeter.从那里如果您必须将标题名称与值分开，就像使用每行的第一个冒号作为分隔符一样简单。