提升C ++正则表达式以匹配SIP标头

Question

We have the following below to match a SIP Via header. 我们在下面提供了与SIP Via标头匹配的内容。 We are using boost C++ regex 我们正在使用boost C ++正则表达式

 Via: SIP/2.0/UDP 192.168.1.4:62486;rport;branch=z9hG4bK-524287-1---3ff9a622846c0a01;stck=3449406834;received=90.206.135.26

Regex: 正则表达式：

 std::regex g_Via;

 g_Via("(^Via:\\s+SIP/2\\.0/UDP\\s+)(((\\w|\\.|-)+):(\\d+))((;\\s*rport=(\\d+))|(;stck=(\\d+))|(;[^;\\n\\s]+)*)(\\s*$)",std::regex_constants::icase)

 std::match_results<std::string::const_iterator> result;
 bool valid = std::regex_match(line, result, g_Via);
 if(valid)
 {
    std::string rport = result[8].str();
    std::string stckval = result[9].str();
    // use these values
 }

What we would like is to grab the rport, received and stck parameters after the IP address. 我们想要的是在IP地址之后获取rport，receive和stck参数。 We can get the IP address using the above the expression but have a problem getting individual parameters. 我们可以使用上面的表达式获取IP地址，但在获取单个参数时遇到问题。

The rport parameter can be either ;rport or ;rport=14838 ie on its own or with a value. rport参数可以是; rport或; rport = 14838，即单独使用或使用值。

The problem we have is the params such as ;branch= ;received= can be in different positions 我们遇到的问题是诸如的分支; branch =; received =可以处于不同的位置

Answer 1

I wouldn't recommend "parsing" SIP headers using regex. 我不建议使用正则表达式“解析”SIP标头。

As mentioned in comments already, handling attributes becomes unwieldy. 正如评论中已提到的，处理属性变得难以处理。 Also, you will find there are subtle details in the specification (rfc 2616/rfc 822) that make it hard to get right. 此外，您会发现规范中有细微的细节（rfc 2616 / rfc 822），这使得很难做到正确。

I've created a SIP header parser using Boost Spirit earlier: 我之前使用Boost Spirit创建了一个SIP头解析器：

How to parse multi-line headers of SIP message using regex? 如何使用正则表达式解析SIP消息的多行标题？

I've actually live-streamed creating that parser. 我实际上是直播创建解析器。 Here's the vods of the live stream in case you like to see: part #1 , part #2 , part #3 and part #4 . 以下是您希望看到的直播视频：第1 部分，第2 部分，第3 部分和第4部分。

The benefits of using a parser generator here is that you don't end up with raw match groups, but can parse directly into something useful for futher processing, eg 在这里使用解析器生成器的好处是，您最终不会使用原始匹配组，但可以直接解析为有用的进一步处理的内容，例如

using Headers = std::map<std::string, std::string>;

Live On Coliru 住在Coliru

//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/algorithm/string/trim.hpp>
#include <map>

using Headers = std::map<std::string, std::string>;

template <typename It> Headers parse_headers(It first, It last) 
{
    using namespace boost::spirit::qi;

    auto& crlf       = "\r\n";
    auto& tspecials = " \t><@,;:\\\"/][?=}{:";

    rule<It, std::string()> token, value;

    token = +~char_(tspecials); // FIXME? should filter CTLs
    value = *(char_ - (crlf >> &(~blank | eoi)));
    BOOST_SPIRIT_DEBUG_NODES((token)(value));

  //value = *(omit[ crlf >> !(~blank | eoi) ] >> attr(' ') | (char_ - crlf));

    Headers headers;
    bool ok = phrase_parse(first, last, (token >> ':' >> value) % crlf >> omit[*lit(crlf)], blank, headers);

#ifdef DEBUG
    if (ok)          std::cerr << "DEBUG: Parse success\n";
    else             std::cerr << "DEBUG: Parse failed\n";
    if (first!=last) std::cerr << "DEBUG: Remaining unparsed input: '" << std::string(first,last) << "'\n";
#endif

    if (ok && (first==last))
        return headers;

    throw std::runtime_error("Parse error in headers\n"); // TODO FIXME
}

int main()
{
    boost::spirit::istream_iterator iter(std::cin >> std::noskipws), end;

    for (auto& header : parse_headers(iter, end)) {
        std::cout << "Key: '" << header.first << "', Value: '" << header.second << "'\n";
    }
}

For input: 输入：

Via: SIP/2.0/UDP 10.10.1.99:5060;branch=z9hG4bK343bf628;rport
Contact: <sip:15@10.10.1.99>
Call-ID: 326371826c80e17e6cf6c29861eb2933@10.10.1.99
CSeq: 102 INVITE
User-Agent: Asterisk PBX
Max-Forwards: 70
Date: Wed, 06 Dec 2009 14:12:45 GMT
Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, SUBSCRIBE, NOTIFY
Supported: replaces
Content-Type: application/sdp
Content-Length: 258
From: "Test 15" <sip:15@10.10.1.99>
 ; tag   =    fromtag
To: <sip:13@10.10.1.13>;tag=totag

It prints the output 它打印输出

Key: 'Allow', Value: 'INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, SUBSCRIBE, NOTIFY'
Key: 'CSeq', Value: '102 INVITE'
Key: 'Call-ID', Value: '326371826c80e17e6cf6c29861eb2933@10.10.1.99'
Key: 'Contact', Value: '<sip:15@10.10.1.99>'
Key: 'Content-Length', Value: '258'
Key: 'Content-Type', Value: 'application/sdp'
Key: 'Date', Value: 'Wed, 06 Dec 2009 14:12:45 GMT'
Key: 'From', Value: '"Test 15" <sip:15@10.10.1.99>
 ; tag   =    fromtag'
Key: 'Max-Forwards', Value: '70'
Key: 'Supported', Value: 'replaces'
Key: 'To', Value: '<sip:13@10.10.1.13>;tag=totag'
Key: 'User-Agent', Value: 'Asterisk PBX'
Key: 'Via', Value: 'SIP/2.0/UDP 10.10.1.99:5060;branch=z9hG4bK343bf628;rport'

Answer 2

Depending on what language you're working with, dealing with the attributes may be better done separately from the regex. 根据您正在使用的语言，处理属性可能最好与正则表达式分开进行。 You can use the regex to extract each attribute (or perhaps the entire attribute string--everything from the first ; to the end). 您可以使用正则表达式来提取每个属性（或者整个属性字符串-从第一的一切;到最后）。 After that, you can split the attribute string using ; 之后，您可以使用分割属性字符串; as the delimiter. 作为分隔符。 Both Python and PHP, for example, have easy functions to do this ( split() in Python, and explode() in PHP). 例如，Python和PHP都有简单的功能split() Python中的split()和PHP中的explode() ）。 You can then split each attribute by the = to separate the attribute name from the attribute value. 然后，您可以通过=拆分每个属性，以将属性名称与属性值分开。

Answer 3

You're using too many groups I couldn't keep track, so I removed most of them. 您使用了太多我无法跟踪的群组，所以我删除了大部分群组。 This will work in any of the languages you mentioned. 这将适用于您提到的任何语言。

(?<Named> groups) may not be supported in some flavors, but you can easily change it to a normal (group) . 某些风格可能不支持(?<Named> groups) ，但您可以轻松将其更改为普通(group) 。 I used them for practicity. 我用它们来实用。

Regex 正则表达式

^Via:\s+SIP\/2\.0\/UDP\s+                # header
([-.\w]+):(\d+)                          # IP (group 1) and port (group 2)
(?:                                      # ITERATE
    (?<received>;received=[.\d]+)        #   received (group "received")
  |                                      #
    (?<rport>;rport                      #   rport (group "rport")
        (?:=(?<rportval>[0-9]+))?        #    with optional num (group "rportval")
    )                                    #
  |                                      #
    (?<stck>;stck=\d+)                   #   stck (group "stck")
  |                                      #
    ;[^;\n\s=]+(?:=[^;]+)?               #   any other param (not captured)
)*                                       # Repeat iteration *
\s*$                                     # to EoL

One-liner: 一内胆：

^Via:\s+SIP\/2\.0\/UDP\s+([-.\w]+):(\d+)(?:(?<received>;received=[.\d]+)|(?<rport>;rport(?:=(?<rportval>[0-9]+))?)|(?<stck>;stck=\d+)|;[^;\n\s=]+(?:=[^;]+)?)*\s*$

Code 码

Using Boost.Regex : 使用Boost.Regex ：

#include <iostream>
#include <boost/regex.hpp>
using namespace std;
using namespace boost;


int main  () {
    string subject = "Via: SIP/2.0/UDP 192.168.1.4:62486;rport=12345;branch=z9hG4bK-524287-1---3ff9a622846c0a01;stck=3449406834;received=90.206.135.26";
    string pattern = "^Via:\\s+SIP/2\\.0/UDP\\s+([-.\\w]+):([0-9]+)(?:(?<received>;received=[.0-9]+)|(?<rport>;rport(?:=(?<rportval>[0-9]+))?)|(?<stck>;stck=[0-9]+)|;[^;\\n\\s=]+(?:=[^;]*)?)*\\s*$";
    smatch match;


    const regex re(pattern);
    if (regex_search(subject, match, re)) {
        string received = match["received"];
        string rport = match["rport"];
        string rportval = match["rportval"];
        string stck = match["stck"];
        cout << "rport = " << rport << endl << "rportval = " << rportval << endl;
    } else {
        cout << "NO MATCH" << endl;
    }
    return 0;
}

Output: 输出：

rport = ;rport=12345
rportval = 12345

rextester.com demo rextester.com演示

提升C ++正则表达式以匹配SIP标头

问题描述

3 个解决方案

解决方案1
3 2015-11-04 09:53:22

解决方案2
2 2015-11-04 08:33:32

解决方案3
1 已采纳 2015-11-04 09:47:45

提升C ++正则表达式以匹配SIP标头

问题描述

3 个解决方案

解决方案1 3 2015-11-04 09:53:22

解决方案2 2 2015-11-04 08:33:32

解决方案3 1 已采纳 2015-11-04 09:47:45

解决方案1
3 2015-11-04 09:53:22

解决方案2
2 2015-11-04 08:33:32

解决方案3
1 已采纳 2015-11-04 09:47:45