简体   繁体   中英

Multiple replacement with just one regex

Let's suppose, for the sake of simplicity, we have the following string:

"John loves Mary, Mary loves Jake and Jake doesn't care about John and Mary."

Let's suppose I wanna use regex to change the characters of that story.

John -> Joseph

Mary -> Jessica

Jake -> Keith

Of course I can change which one of those, one at a time.

But I'd like to know if it's possible to change all of them with just a single regex replacement, like a "multiple replacement" or "conditional replacement".

Something like:

regex: (?:(?<name1>John)|(?<name2>Mary)|(?<name3>Jake))

replacement: (?(name1)Joseph|(?(name2)Jessica|(?(name3)Keith)))

This is just a simple example.

In my application, I have to perform around 20 replacements for each string, which impacts the performance of the application.

The regex flavor I'm using is PCRE.

The application is being coded using C++ with Qt framework.

So you're using the so-called PCRE flavor . Good, except this doesn't say exactly which library you're using. Let's review a few options here, as a couple different libraries claim to be Perl-compatible.

Boost

That's the simplest solution. boost::regex supports exactly what you're asking for through its Boost-Extended Format String Syntax .

So you can replace the pattern:

(?<name1>John)|(?<name2>Mary)|(?<name3>Jake)

With the replacement string:

(?{name1}Joseph:(?{name2}Jessica:Keith))

And sure, it works. You can test it in Notepad++, but here's some sample code:

#include <string>
#include <iostream>
#include <boost/regex.hpp>

int main(int argc, char **argv) {
    std::string subject("John loves Mary, Mary loves Jake and Jake doesn't care about John and Mary.");
    const char* replacement = "(?{name1}Joseph:(?{name2}Jessica:Keith))";

    boost::regex re("(?<name1>John)|(?<name2>Mary)|(?<name3>Jake)", boost::match_perl);

    std::string result = boost::regex_replace(subject, re, replacement, boost::format_all);
    std::cout << result << std::endl;

    return 0;
}

PCRE2

PCRE catched up with Boost and introduced a richer substitution syntax through the PCRE2_SUBSTITUTE_EXTENDED . As of this post (v10.20), this code isn't released yet, but it's available in the source repository (revision 381), so if you need this solution now, you'll have to build PCRE2 from source.

The pattern is the same but the replacement string has a different syntax:

${name1:+Joseph:${name2:+Jessica:Keith}}

Here's some sample C code:

#include <stdio.h>
#include <string.h>

#define PCRE2_CODE_UNIT_WIDTH 8
#include <pcre2.h>

int main(int argc, char **argv) {
    int error;
    PCRE2_SIZE erroffset;

    const PCRE2_SPTR pattern = (PCRE2_SPTR)"(?<name1>John)|(?<name2>Mary)|(?<name3>Jake)";
    const PCRE2_SPTR subject = (PCRE2_SPTR)"John loves Mary, Mary loves Jake and Jake doesn't care about John and Mary.";
    const PCRE2_SPTR replacement = (PCRE2_SPTR)"${name1:+Joseph:${name2:+Jessica:Keith}}";

    pcre2_code *re = pcre2_compile(pattern, PCRE2_ZERO_TERMINATED, 0, &error, &erroffset, 0);
    if (re == 0)
        return 1;

    pcre2_jit_compile(re, PCRE2_JIT_COMPLETE);

    PCRE2_UCHAR output[1024] = "";
    PCRE2_SIZE outlen = sizeof(output) / sizeof(PCRE2_UCHAR);

    int rc = pcre2_substitute(re, subject, PCRE2_ZERO_TERMINATED, 0, PCRE2_SUBSTITUTE_GLOBAL | PCRE2_SUBSTITUTE_EXTENDED, 0, 0, replacement, PCRE2_ZERO_TERMINATED, output, &outlen);
    if (rc >= 0)
        printf("%s\n", output);

    pcre2_code_free(re);
    return 0;
}

PCRE

With PCRE (<v10), you're out of luck. It lacks a substitution function, this is left for the developer.

...which means if that's the library you're using, you'll have full control over the substitution process anyway. You could use a pattern such as:

John(*MARK:1)|Mary(*MARK:2)|Jake(*MARK:3)

And then, substitute by discriminating on the last encountered MARK .

Qt

Qt's QRegularExpression class encapsulates the PCRE library (not PCRE2), but it doesn't seem to expose all of the PCRE features.

Anyway, the QString::replace overload which accepts a QRegularExpression doesn't look like it's fully featured:

QString & QString::replace(const QRegularExpression & re, const QString & after)

So you're on your own here.

My 2 cents

Hey, maybe for such a simple replacement, a regular expression is overkill... If you have a performance issue, you should try to implement these replacements by hand - a carefully crafted algorithm should be faster than a regex solution. Just make sure to profile your code and see where the culprit is.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM