简体   繁体   中英

What's the best way to escape user input for Regular Expressions in MySQL?

I'd like to take user input, denoted as $dangerous_string, and use it as part of a RegEx in a MySQL query.

What's the best way to go about doing this? I want to use the user's string as a literal -- if it contains any characters that mean something in MySQL RegEx, those characters should not actually affect my Regular Expression.

$dangerous_string = $_GET["string"];
//do something here
$dangerous_string = what_goes_here($dangerous_string);
$sql = "SELECT * FROM table WHERE search_column REGEX '" . $mysqli->real_escape_string("[[:<:]]$dangerous_string") . "'";

//etc....

AFAIK, there is no native way of escaping for MySQL regex. You can do it in PHP with preg_quote (http://www.php.net/manual/en/function.preg-quote.php) which would probably do the job for you, but is obviously not designed for the purpose.

My preferred way if I were in your situation would be to construct a regex whitelist in PHP that you can then apply to your dangerous string:

$safeString = preg_replace('/[^\w]/','',$dangerousString);

This removes any non-word characters (ie anything except A-Za-z0-9_) from your string.

NB I believe the other answers given will not remove/escape regex special characters, which I believe is your requirement.

Use preg_quote and replace & manually

preg_quote takes a string and escapes special characters with a backslash. It is meant for PHP regexes, not MySQL regexes, and it does not escape & , which is needed for MySQL. So we only need to modify it like so:

function escape_regex_for_mysql($dangerous_string) {
    return preg_replace('/&/', '\\&', preg_quote($dangerous_string));
}

Note that you should still use prepared statements (or $mysqli->real_escape_string ) on top, like this:

$query = $wpdb->prepare(
    'SELECT * FROM table WHERE search_column REGEXP %s',
    '[[:<:]]' . escape_regex_for_mysql($dangerous_string)
);

Rationale:

Let's look at MySQL's documentation to see which characters will need to be escaped. The documentation says:

MySQL >= 8.0.4 implements regular expression support using International Components for Unicode (ICU), which provides full Unicode support and is multibyte safe.

So let's look at the documentation for the ICU project :

\\ (outside of sets) Quotes the following character. Characters that must be quoted to be treated as literals are * ? + [ ( ) { } ^ $ | \\ .

\\ (inside sets) Quotes the following character. Characters that must be quoted to be treated as literals are [ ] \\ Characters that may need to be quoted, depending on the context are - &

So the list of special characters that need to be escaped are * ? + [ ] ( ) { } ^ $ \\ . - & * ? + [ ] ( ) { } ^ $ \\ . - & * ? + [ ] ( ) { } ^ $ \\ . - & . preg_quote escapes all of these except & . It also escapes some characters unnecessarily, but MySQL will handle that as expected.

Alternatively, you can add & as a second parameter to preg_quote for it to be escaped, eg preg_quote('AT&T', '&')

You need to ensure that quotes and ticks are properly handled before passing to the database. The best method for this is:

   mysql_real_escape_string  ([php doc][1])

This method is available in both PHP, and C++ mysql client libraries.

This should ensure any 'dangerous_string' is no longer dangerous and can be used within a quoted string used by RegEx.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM