繁体   English   中英

跨多个缓冲区的字符串搜索算法

[英]string search across multiple buffers algorithm

我正在开发一个 NGINX 模块,需要在没有累积缓冲区的情况下即时在响应正文中进行复杂的字符串替换(请参见下面的ngx_http_output_body_filter_by_me )。 有时, chain中的缓冲区无法保存所有响应,例如在{"ABC", "DEF", "GHI"}中找到"FGH"Socket Buffer 的一个小警告说明,因此我必须保存匹配上下文才能继续下次。

C/C++ 中是否有现成的库来搜索字符串的多个缓冲区?

ngx_int_t (*ngx_http_output_body_filter_pt)(ngx_http_request_t *r, ngx_chain_t *chain)
// A callback to a body filter. In modules this is normally used as follows:
static ngx_http_output_body_filter_pt ngx_http_next_body_filter;



// https://tengine.taobao.org/book/chapter_12.html#subrequest-99

typedef struct ngx_http_my_ctx_s {
    const char* pattern_pending; // save the position if partial match
} ngx_http_my_ctx_t;


//https://serverfault.com/questions/480352/modify-data-being-proxied-by-nginx-on-the-fly

/* Intercepts HTTP Response Message Body by our module
 * \param r the request structure pointer containing the details of a request and response
 * \param chain the chained buffers containing the received response this time
 */
ngx_int_t ngx_http_output_body_filter_by_me(ngx_http_request_t *r, ngx_chain_t *chain) {
    // TODO Auto-generated method stub
    //logdf("%.*s", ARGS_NGX_STR(req->unparsed_uri));
    const char* pattern = "substring";
    size_t pattern_length = strlen(pattern);
    const char* pattern_pending;
    for (ngx_chain_t *cl = chain; cl; cl = cl->next) {
        ngx_buf_t *buf = cl->buf;
        // logdf("%.*s", (int)(buf->last - buf->pos), buf->pos);
        for (u_char* pch = buf->pos; pch <= buf->last; ++pch) {
            // ctx->pattern_pending = pattern + pos;
        }
    }
}

参考

NGINX 参考文献

我使用了一个简单的实现( 在线)。

/** match granularity is bytes, i.e. compare byte by byte.
 * @param mem the text to be searched
 * @param mem_len the text length
 * @param pattern the word sought
 * @param pattern_length the pattern length
 * @param pattern_index the continuing index from the last partial/whole match or 0
 * @return the past-the-end index after the text is looked through or the stop index if partial match occurs
 * @example size_t mem_idx, index = 0; // if matched, current matching range in mem is [mem_idx-(index-old_index), mem_idx-1].
 *  mem_idx = memmatch("ABC", 3, "FGH", 3, &index); // NotFound: mem_idx = 3; index = 0;
 *  mem_idx = memmatch("EFG", 3, "FGH", 3, &index); // Continue: mem_idx = 3; index = 2; mem[1,2]=pat[0,1]
 *  mem_idx = memmatch("HIJ", 3, "FGH", 3, &index); // Complete: mem_idx = 1; index = 3; mem[0,0]=pat[2,2]
 */
size_t memmatch(const void* mem, size_t mem_len, const void* pattern, size_t pattern_length, size_t* pattern_index) {
    assert(*pattern_index < pattern_length);
    size_t idx = 0; // do for loop on `mem`
    register size_t index = *pattern_index;
    for (; idx < mem_len;) {
        if (*((const char*)mem + idx++) == *(const char*)pattern + index) {
            ++index;
            if (pattern_length == index) {
                break; // ++idx;
            }
        } else if (index) {
            index = 0; // reset
        }
    }
    *pattern_index = index;
    return idx;
}

#ifdef MEMMATCH_EXAMPLE
void memmatch_example0() {
    size_t mem_idx, idx, index = 0;
    mem_idx = memmatch("ABC", 3, "FGH", 3, (idx = index, &index)); // NotFound
    std::cout << "mem_idx stops at " << mem_idx << ", pat_idx stops at " << index << " since " << idx << '\n';
    mem_idx = memmatch("EFG", 3, "FGH", 3, (idx = index, &index)); // Continue
    std::cout << "mem_idx stops at " << mem_idx << ", pat_idx stops at " << index << " since " << idx << '\n';
    mem_idx = memmatch("HIJ", 3, "FGH", 3, (idx = index, &index)); // Complete
    std::cout << "mem_idx stops at " << mem_idx << ", pat_idx stops at " << index << " since " << idx << '\n';
}
#endif

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM