[英]Escape elasticsearch special characters in PHP
我想创建一个函数,通过在 PHP 中的字符前添加 \\ 来转义 elasticsearch 特殊字符。 Elasticsearch 使用的特殊字符是: + - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \\ /
我对正则表达式不是很熟悉,但我发现了一段代码可以简单地删除特殊字符,但我更喜欢转义它们,因为它们可能是相关的。 我使用的代码:
$s_input = 'The next chars should be escaped: + - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ / Did it work?';
$search_query = preg_replace('/(\+|\-|\=|\&|\||\!|\(|\)|\{|\}|\[|\]|\^|\"|\~|\*|\<|\>|\?|\:|\\\\)/', '', $s_input);
这输出:
The next chars should be escaped / Did it work
所以有两个问题:这段代码删除了特殊字符,而我想用\\
转义它们。 此外:此代码不会转义\\
。 有谁知道如何转义 Elasticsearch 特殊字符?
您可以将preg_match与反向引用一起使用,因为 stribizhev 已经注意到它(最简单的方法):
$string = "The next chars should be escaped: + - = && || > < ! ( ) { } [ ] ^ \" ~ * ? : \ / Did it work?";
function escapeElasticReservedChars($string) {
$regex = "/[\\+\\-\\=\\&\\|\\!\\(\\)\\{\\}\\[\\]\\^\\\"\\~\\*\\<\\>\\?\\:\\\\\\/]/";
return preg_replace($regex, addslashes('\\$0'), $string);
}
echo escapeElasticReservedChars($string);
或使用preg_match_callback函数来实现。 感谢回调,您将能够拥有当前匹配并对其进行编辑。
将被调用并传递主题字符串中匹配元素数组的回调。 回调应该返回替换字符串。 这是回调签名:
这是在行动:
<?php
$string = "The next chars should be escaped: + - = && || > < ! ( ) { } [ ] ^ \" ~ * ? : \ / Did it work?";
function escapeElasticSearchReservedChars($string) {
$regex = "/[\\+\\-\\=\\&\\|\\!\\(\\)\\{\\}\\[\\]\\^\\\"\\~\\*\\<\\>\\?\\:\\\\\\/]/";
$string = preg_replace_callback ($regex,
function ($matches) {
return "\\" . $matches[0];
}, $string);
return $string;
}
echo escapeElasticSearchReservedChars($string);
输出: The next chars should be escaped\\: \\+ \\- \\= \\&\\& \\|\\| \\> \\< \\! \\( \\) \\{ \\} \\[ \\] \\^ \\" \\~ \\* \\? \\: \\\\ \\/ Did it work\\?
The next chars should be escaped\\: \\+ \\- \\= \\&\\& \\|\\| \\> \\< \\! \\( \\) \\{ \\} \\[ \\] \\^ \\" \\~ \\* \\? \\: \\\\ \\/ Did it work\\?
如果有人正在寻找稍微冗长(但可读!)的解决方案:
public function escapeElasticsearchValue($searchValue)
{
$searchValue = str_replace('\\', '\\\\', $searchValue);
$searchValue = str_replace('*', '\\*', $searchValue);
$searchValue = str_replace('?', '\\?', $searchValue);
$searchValue = str_replace('+', '\\+', $searchValue);
$searchValue = str_replace('-', '\\-', $searchValue);
$searchValue = str_replace('&&', '\\&&', $searchValue);
$searchValue = str_replace('||', '\\||', $searchValue);
$searchValue = str_replace('!', '\\!', $searchValue);
$searchValue = str_replace('(', '\\(', $searchValue);
$searchValue = str_replace(')', '\\)', $searchValue);
$searchValue = str_replace('{', '\\{', $searchValue);
$searchValue = str_replace('}', '\\}', $searchValue);
$searchValue = str_replace('[', '\\[', $searchValue);
$searchValue = str_replace(']', '\\]', $searchValue);
$searchValue = str_replace('^', '\\^', $searchValue);
$searchValue = str_replace('~', '\\~', $searchValue);
$searchValue = str_replace(':', '\\:', $searchValue);
$searchValue = str_replace('"', '\\"', $searchValue);
$searchValue = str_replace('=', '\\=', $searchValue);
$searchValue = str_replace('/', '\\/', $searchValue);
// < and > can’t be escaped at all. The only way to prevent them from
// attempting to create a range query is to remove them from the query
// string entirely
$searchValue = str_replace('<', '', $searchValue);
$searchValue = str_replace('>', '', $searchValue);
return $searchValue;
}
似乎给出的答案实际上都没有遵循文档,所以这是另一个正确编码任何不受信任的输入的答案:
/**
* @param string $s untrusted user input
* @return string safe string to be used in `query_string` argument to elasticsearch
*/
function escapeForElasticSearch($s)
{
static $keys = array();
static $values = array();
if (!$keys)
{
# https://www.elastic.co/guide/en/elasticsearch/reference/5.5/query-dsl-query-string-query.html#_reserved_characters
$replacements = array(
"\\" => "\\\\", # must be done first to not double encode later backslashes!
"+" => "\\+",
"-" => "\\-",
"=" => "\\=",
"&" => "\\&",
"|" => "\\|",
">" => "", # cannot be safely encoded
"<" => "", # cannot be safely encoded
"!" => "\\!",
"(" => "\\(",
")" => "\\)",
"{" => "\\{",
"}" => "\\}",
"[" => "\\[",
"]" => "\\]",
"^" => "\\^",
"\"" => "\\\"",
"~" => "\\~",
"*" => "\\*",
"?" => "\\?",
":" => "\\:",
"/" => "\\/",
);
$keys = array_keys($replacements);
$values = array_values($replacements);
}
return str_replace($keys, $values, $s);
}
注意&
或|
并不是单独的特殊,但正确处理这些字符的奇数比仅仅编码这些字符的每个实例更困难。
完全公开,我从未使用过弹性搜索,我的建议不是来自个人经验,甚至不是用弹性搜索测试过的。 我根据我对正则表达式和字符串操作技能的了解来生成这个建议。 如果有人发现漏洞,我将很高兴收到您的评论。
我的片段:
<
和>
然后代码:(演示)
$string = "To be escaped: + - = && || > < ! ( ) { } [ ] ^ \" ~ * ? : \ / triple ||| and split '&<&'";
echo escapeElasticSearchReservedChars($string);
function escapeElasticSearchReservedChars(string $string): string
{
return preg_replace(
[
'_[<>]+_',
'_[-+=!(){}[\]^"~*?:\\/\\\\]|&(?=&)|\|(?=\|)_',
],
[
'',
'\\\\$0',
],
$string
);
}
输出:
To be escaped\: \+ \- \= \&& \|| \! \( \) \{ \} \[ \] \^ \" \~ \* \? \: \\ \/ triple \|\|| and split '\&&'
先去掉<
和>
的原因是为了让别人不能试图破解替换的设计并试图传入|>|
否则会阻止两个连续管道的适当转义(在删除>
之后)。
简单的方法是使用单个字符类进行匹配。
唯一的问题是使用什么作为分隔符(为了可读性)。
使用@
作为正则表达式分隔符,它的
查找: '@[-+=&|><!(){}[\\]^"~*?:\\\\\\/]@'
替换: '\\\\$0'
但是,如果实际字符已经被转义怎么办?
然后怎样呢?
一个解决方案是找到那些没有转义的。
查找: '@(?<!\\\\\\)(?:\\\\\\\\\\\\\\)*\\K(?:[-+=&|><!(){}[\\]^"~*?:/]|\\\\\\(?!\\\\\\))@'
替换: '\\\\$0'
格式化:
(?<! \\ ) # Not an escape behind
(?: \\ \\ )* # Possible even number of escapeds
\K # Don't include the previous escapes in match
(?:
[-+=&|><!(){}[\]^"~*?:/] # Either 1 of these special characters
| # or,
\\ # An escape character that is
(?! \\ ) # not followed by escape itself.
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.