简体   繁体   中英

Why PCRE regex only capture 19 groups?

  1. My Question:

My regex pattern is: (a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)(l)(m)(n)(o)(p)(q)(r)(s)(t)(u)(v)(w)(x)(y)(z)

and My string is: abcdefghijklmnopqrstuvwxyz

the code's output is:

i_0:0 i_1:26 i_2:0 i_3:1 i_4:1 i_5:2 i_6:2 i_7:3 i_8:3 i_9:4 i_10:4 i_11:5 i_12:5 i_13:6 i_14:6 i_15:7 i_16:7 i_17:8 i_18:8 i_19:9 i_20:9 i_21:10 i_22:10 i_23:11 i_24:11 i_25:12 i_26:12 i_27:13 i_28:13 i_29:14 i_30:14 i_31:15 i_32:15 i_33:16 i_34:16 i_35:17 i_36:17 i_37:18 i_38:18 i_39:19 i_40:0 i_41:0 i_42:0 i_43:0 i_44:0 i_45:0 i_46:0 i_47:0 i_48:0 i_49:0 i_50:0 i_51:0 i_52:0 i_53:0 i_54:0 i_55:0 i_56:0 i_57:0 i_58:0 i_59:0

Question: Why PCRE regex only capture 19 groups?

  1. My Code
#include <pcre.h>
#include <iostream>

pcre* _rex;
pcre_extra* _rexEx;

void CompileRexStr(const std::string& rex) {
    const char* errorinfo;
    int errpos = 0;
    _rex = NULL;
    _rexEx = NULL;

    _rex = pcre_compile(rex.c_str(), PCRE_UTF8, &errorinfo, &errpos, NULL);
    _rexEx = pcre_study(_rex, PCRE_STUDY_JIT_COMPILE, &errorinfo);
}

int main(){
    std::string rex = "(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)(l)(m)(n)(o)(p)(q)(r)(s)(t)(u)(v)(w)(x)(y)(z)";
    CompileRexStr(rex);

    std::string str = "abcdefghijklmnopqrstuvwxyz";
    int result[60] = {0};
    int cur = 0;
    int pos = pcre_exec(_rex, _rexEx, str.c_str(), str.length(), cur, 0, result, 60);

    for(int i=0;i < 60; i++) {
        std::cout << "i_" << i << ":" << result[i] << " ";
    }

    return 0;
}

It returns 19 capture groups, because you provided space to return 20 matches, and one is used for whole matching string

Captured substrings are returned to the caller via a vector of integers whose address is passed in ovector. The number of elements in the vector is passed in ovecsize, which must be a non-negative number. Note: this argument is NOT the size of ovector in bytes.

The first two-thirds of the vector is used to pass back captured substrings , each substring using a pair of integers. The remaining third of the vector is used as workspace by pcre_exec() while matching capturing subpatterns, and is not available for passing back information. The number passed in ovecsize should always be a multiple of three. If it is not, it is rounded down.

Source: Manual for PCRE

If you have 26 capture groups, you need to pass a vector containing (26 + 1)×3 = 81 element at least.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM