简体   繁体   中英

Encoding an url correctly while using a rewrite engine

I am using a mod_rewrite and a routing system extracted from Akelos Framework.

I have a very big problem while using some symbols in a search key parameter.

A routing map is following:

$map->connect(":lang/search/:string", array('controller' => 'search','action' => 'index'));

In controller now I get $this->registry->map['params']['get']['string'] as a search keyword.

I can't find a way to properly encode a url. For example let's take a string t\\ /#%&=

urlencode() gives t%5C+%2F%23%25%26%3D and page displays The requested URL /site/en/search/t\\+/#%&= was not found on this server.

rawurlencode() gives t%5C%20%2F%23%25%26%3D and page displays the same.

You can download or view a router class source here on this page

I really do not want to use base64 for url and such encodings by which you can't read anything.

In case if you need here is a .htaccess file contents as well:

<IfModule mod_rewrite.c>
   RewriteEngine On
   RewriteCond %{REQUEST_FILENAME} !-d
   RewriteCond %{REQUEST_FILENAME} !-f
   RewriteRule ^(.*)$ index.php?url=$1 [QSA,L]
</IfModule>

Update

Here are actually working files for making tests.

Please download these files and test on your server if you have time.

Guide:

controllerclass.php - Simple controller framework, enables searchcontroller.php to work by defining a class "Controller" in it

routerclass.php - A router class extracted from Akelos Framework, bug is probably there

routes.php - A place where you define your routs, in our case we have only /search/:string

searchcontroller.php - A basic application to test strings - /search/stringhere points to this file

index.php - Where all the initiation and routing happens to begin

.htaccess - I do not think an error is here

I think you won't need to make changes in index.php , controllerclass.php , routes.php , searchcontroller.php

A bug is probably in routerclass.php or maybe there is some fix needed in .htaccess which I don't believe.

Looks like the issue is about RFC 3986 Section 7.3 (Back-End Transcoding) regarding urlencode and urldecode . I've slightly modified the function at http://php.net/manual/en/function.urlencode.php#97969 :

function myUrlEncode($string) {
    $entities = array('%21', '%2A', '%27', '%28', '%29', '%3B', '%3A', '%40', '%26', '%3D', '%2B', '%24', '%2C', '%2F', '%5C', '%3F', '%25', '%23', '%5B', '%5D');
    $replacements = array('!', '*', "'", "(", ")", ";", ":", "@", "&", "=", "+", "$", ",", "/", "\\", "?", "%", "#", "[", "]");
    return htmlspecialchars(str_replace($entities, $replacements, urlencode($string)));
}

Note the addition of %5C => \\ and htmlspecialchars() (htmlspecialchars is about security rather than being able to use special characters. The input may be <script>... or <h1>... etc :) ).

So you will be using it like:

print("<b><i>URL Encode Tests</i></b><br /><br />
    <b>Works:</b> ".myUrlEncode($string[0])." <a href=\"".HTTP_ROOT."/search/".myUrlEncode($string[0])."\">/search/".myUrlEncode($string[0])."</a><br />
    <b>Does not work:</b> ".myUrlEncode($string[1])." <a href=\"".HTTP_ROOT."/search/".myUrlEncode($string[1])."\">/search/".myUrlEncode($string[1])."</a><br />
    <b>Does not work:</b> ".myUrlEncode($string[2])." <a href=\"".HTTP_ROOT."/search/".myUrlEncode($string[2])."\">/search/".myUrlEncode($string[2])."</a><br />
");

After doing that, the search string #3 ( \\ /#%&= ) gives a PHP error like "Method is invalid in ...\\index.php on line 30". 在第30行的... \\ index.php中无效”。 I guess this is about the regexes in the router, so you may need to do a few adjustments there.

The error given is:

"The requested URL / site /en/search/"

You have the extra word 'site' in it which isn't mentioned in your question, which makes it hard to interpet, but the error appears to be coming from Apache not PHP.

The error says that the URLs aren't being matched by your htaccess rules. So you don't need to look inside any PHP code to figure out the error, the error is in Apache somewhere.

Searching further - it's because the URL is not valid. %2f is allowed in a query string but not in the path. Because it's invalid the server is rejecting it before it hits the rewrite rules.

The link to www.jampmark.com gives the 5 solutions benefits and issues with each of those solutions but it would be inappropriate to copy that much material to here.

  1. Turn on "AllowEncodedSlashes" directive in Apache
  2. Replace %2F with %252F and %5C with %255C after url encoding
  3. Double urlencode()
  4. Use unencoded slashes
  5. Replace slashes with underscores (_)

Also, there is an error in your test code at:

$string[2] = "t\ /#%&=";

Slash-space is not a valid escape sequence. You should either change the code to "t\\\\ /#%&="; to avoid the backslash being interpreted as an escape character.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM