简体   繁体   English

URL编码和过滤清理输出问题

[英]URL encode and filter sanitize output problems

I am trying to figure out why a sanitized string will be outputted differently than a non sanitized string when being URL encoded. 我试图弄清楚为什么在对URL进行编码时,清理过的字符串与未清理过的字符串的输出方式不同。

I don't know what this is called, but I've searched for URL encode and sanitization and tried google but I can't find any explanation. 我不知道这叫什么,但是我已经搜索了URL编码和清理并尝试了google,但找不到任何解释。

I discovered this by accident after publishing a video, the problem is that I insert titles in the database, fetch them out and create a URL with it. 我在发布视频后偶然发现了这个问题,问题是我在数据库中插入了标题,将其提取出来并创建了一个URL。

Sample URL (which does not work due to the problem) 示例网址(由于该问题而无法使用)

localhost/proviin/video/kojima%26%2339%3Bs+cancelled+masterpiece+-+investigating+silent+hills/16

I made a single page test, to test what was going on and the behavior as you can see below. 我进行了单页测试,以测试正在发生的事情和行为,如下所示。

How I need the outcome to be (but this is not sanitized): 我需要怎样的结果(但这还没有被消毒):

$title = "Kojima's Cancelled Masterpiece - Investigating Silent Hills";
echo $title;
echo "<br>";
echo urlencode($title);

Outputs: (Which would work in the URL) 输出:(将在URL中使用)

  • Kojima's Cancelled Masterpiece - Investigating Silent Hills 小岛被取消的杰作-调查寂静岭
  • Kojima%27s+Cancelled+Masterpiece+-+Investigating+Silent+Hills 小岛%27S +取消+力作+ - + +调查沉默+山

How it is 怎么样

$title = sanitize("Kojima's Cancelled Masterpiece - Investigating Silent Hills", "str");
echo $title;
echo "<br>";
echo urlencode($title);

Outputs: (Which does not work in the URL, but is sanitized) 输出:(在URL中不起作用,但已清除)

  • Kojima's Cancelled Masterpiece - Investigating Silent Hills 小岛被取消的杰作-调查寂静岭

  • Kojima%26%2339%3Bs+Cancelled+Masterpiece+-+Investigating+Silent+Hills 小岛%26%2339个%3BS +取消+力作+ - + +调查沉默+山

Sanitize function 消毒功能

function sanitize($item, $type) {
    switch ($type) {
        case "str":
            return filter_var($item, FILTER_SANITIZE_STRING);
            break;
        case "mail":
            return filter_var($item, FILTER_SANITIZE_EMAIL);
            break;
        case "url":
            return filter_var($item, FILTER_SANITIZE_URL);
            break;
        case "int":
            return filter_var($item, FILTER_SANITIZE_NUMBER_INT);
            break;
        case "float":
            return filter_var($item, FILTER_SANITIZE_NUMBER_FLOAT);
            break;
        default:
            return false;
    }
}

As far as I know: 我所知道的:

You sanitize data before inserting into the database. 您需要先清理数据,然后再插入数据库。

You escape (htmlspecialchars) when you echo 回显时可以逃脱(htmlspecialchars)

But why is sanitized strings outputting differently when using urlencode() ? 但是, 为什么使用urlencode()时,经过清理的字符串输出不同?

If this is the normal behavior, how on earth do I sanitize strings before inserting them into a database table and use them in a URL with urlencode() ? 如果这是正常行为,那么在将字符串插入数据库表并在带有urlencode()的URL中使用它们之前,我该如何清理字符串?

The main purpose of sanitizing before adding to a database is avoiding SQL injection. 在添加到数据库之前进行清理的主要目的是避免SQL注入。 And one of vulnerable symbols is a single quote ' . 易受攻击的符号之一是单引号' That's why it is substituted by other symbol looking the same but without any impact to a database. 这就是为什么用看起来相同但对数据库没有任何影响的其他符号代替它的原因。

So when you sanitize, you substitute vulnerable symbols. 因此,当您进行消毒时,将替换易受攻击的符号。 And after URL encoding this symbols have different codes. 在URL编码之后,这些符号具有不同的代码。 To prevent incompatible URLs, encode strings always after sanitizing, or at least after the same actions. 为防止URL不兼容,请始终在清理之后或至少在执行相同操作之后对字符串进行编码。

When ever I use input text for a file name or folder I use this function to clean it up. 每当我使用输入文本作为文件名或文件夹时,都会使用此功能进行清理。

/* urlsafe - Return a URL safe string */
public static function urlsafe($t)
{
    $t = strtolower($t);
    $t = preg_replace( "/[^a-z0-9]/", " ", $t);
    $t = trim($t);
    $t = preg_replace("/[ ]+/", "-", $t);
    return($t);
}

You are double-escaping your strings. 您正在两次转义字符串。 You should not pass the return value of your sanitize function to urlencode() . 您不应将sanitize函数的返回值传递给urlencode() Both escape the data, but in different ways, so they cannot be chained like you're doing here (not that any escape function should be run twice anyway). 两者都以不同的方式转义数据,因此它们不能像您在此处那样被链接(不是任何转义功能无论如何都应该运行两次)。

So no, you don't need to sanitize your data like this before you insert it into the database. 因此,不,您不需要像这样对数据进行清理,然后再将其插入数据库。 You need to escape it using prepared statements so it comes back in the same way when returned from the database, ready for urlencode() or htmlentities() to work their magic. 您需要使用准备好的语句对它进行转义,以便从数据库返回时以相同的方式返回它,以使urlencode()htmlentities()发挥作用。 Unless you need the data stored in a specific way, in which case a preg_replace is probably better. 除非您需要以特定方式存储数据,否则在这种情况下preg_replace可能更好。

Also, be aware that user input should also not be unserialized() for the exact same reason: http://php.net/manual/en/function.unserialize.php 另外,请注意,出于完全相同的原因,用户输入也不应为unserialized()http : //php.net/manual/en/function.unserialize.php

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM