简体   繁体   中英

PHP prevent double clean url (improvements?)

For a client at work we have build a website.The website has an offering page which can contain variants of the same type/build, so they ran into problems with double clean-urls.

Just now I wrote a function to prevent that from happening by appending a number to the URL. If thatclean url also exists it counts up.





Updated! return $clean_url; on recursion and on return

The function I wrote works fine, but I was wondering if I have taken the right approach and if it maybe could be improved. Here's the code:

public function prevent_double_cleanurl($cleanurl)

    // makes sure it doesnt check against itself
            if($this->ID!=NULL) $and = " AND product_ID <> ".$this->ID;

    $sql = "SELECT product_ID, titel_url FROM " . $this->_table . " WHERE titel_url='".$cleanurl."' " . $and. " LIMIT 1";

    $result = $this->query($sql);

            // if a matching url is found
        $url_parts = explode("-", $result[0]['titel_url']);
        $last_part = end($url_parts);

        // maximum of 2 digits
        if((int)$last_part && strlen($last_part)<3)
            // if a 1 or 2 digit number is found - add to it
            $cleanurl = implode("-", $url_parts);

            // add a suffix starting at 1
                    // recursive check
        $cleanurl = $this->prevent_double_cleanurl($cleanurl.'-'.$last_part);

    return $cleanurl; 

Depending on the likeliness of a "clean-url" being used multiple times, your approach may not be the best to roll with. Say there was "foo" to "foo-10" you'd be calling the database 10 times.

you also don't seem to sanitize the data you shove into your SQL queries. Are you using mysql_real_escape_string (or its mysqli, PDO, whatever brother)?

Revised code:

public function prevent_double_cleanurl($cleanurl) {
    $cleanurl_pattern = '#^(?<base>.*?)(-(?<num>\d+))?$#S';

    if (preg_match($cleanurl_pattern, $base, $matches)) {
        $base = $matches['base'];
        $num = $matches['num'] ? $matches['num'] : 0;
    } else {
        $base = $cleanurl;
        $num = 0;

    // makes sure it doesnt check against itself
    if ($this->ID != null) {
        $and = " AND product_ID <> " . $this->ID;

    $sql = "SELECT product_ID, titel_url FROM " . $this->_table . " WHERE titel_url LIKE '" . $base . "-%' LIMIT 1";
    $result = $this->query($sql);

    foreach ($result as $row) {
        if ($this->ID && $row['product_ID'] == $this->ID) {
            // the given cleanurl already has an ID,
            // so we better not touch it
            return $cleanurl;

        if (preg_match($cleanurl_pattern, $row['titel_url'], $matches)) {
            $_base = $matches['base'];
            $_num = $matches['num'] ? $matches['num'] : 0;
        } else {
            $_base = $row['titel_url'];
            $_num = 0;

        if ($base != $_base) {
            // make sure we're not accidentally comparing "foo-123" and "foo-bar-123"

        if ($_num > $num) {
            $num = $_num;

    // next free number
    return $base . '-' . $num;

I don't know about the possible values for your clean-urls. Last time I did something like this, my base could look like some-article-revision-5 . That 5 being part of the actual bullet, not the duplication-index. To distinguish them (and allow the LIKE to filter out false positives) I made the clean-urls look like $base--$num . the double dash could only occur between the base and the duplication-index, making things a bit simpler…

I have no way to test this, so its on you, but here's how I'd do it. I put a ton of comments in there explaining my reasoning and the flow of the code.

Basically, the recursion is unnecessary will result in more database queries than you need.

public function prevent_double_cleanurl($cleanurl)
    $sql = sprintf("SELECT product_ID, titel_url FROM %s WHERE titel_url LIKE '%s%%'", 
        $this->_table, $cleanurl);
    if($this->ID != NULL){ $sql.= sprintf(" AND product_ID <> %d", $this->ID); }

    $results = $this->query($sql);

    $suffix = 0;
    $baseurl = true;
    foreach($results as $row)
        // Consider the case when we get to the "first" row added to the db:
        //  For example: $row['titel_url'] == $cleanurl == 'domain.nl/product/machine'
        if($row['title_url'] == $cleanurl)
            $baseurl = false;   // The $cleanurl is already in the db, "this" is not a base URL
            continue;           // Continue with the next iteration of the foreach loop

        // This could be done using regex, but if this works its fine.
        // Make sure to test for the case when you have both of the following pages in your db:
        //  some-hyphenated-page
        //  some-hyphenated-page-name
        // You don't want the counters to get mixed up
        $url_parts = explode("-", $row['titel_url']);
        $last_part = array_pop($url_parts);
        $cleanrow = implode("-", $url_parts);

        // To get into this block, three things need to be true
        //  1. $last_part must be a numeric string (PHP Duck Typing bleh)
        //  2. When represented as a string, $last_part must not be longer than 2 digits
        //  3. The string passed to this function must match the string resulting from the (n-1) 
        //      leading parts of the result of exploding the table row
        if((is_numeric($last_part)) && (strlen($last_part)<=2) && ($cleanrow == $cleanurl))
            $baseurl = false;                           // If there are records in the database, the 
                                                        //  passed $cleanurl isn't the first, so it 
                                                        //  will need a suffix
            $suffix = max($suffix, (int)$last_part);    // After this foreach loop is done, $suffix 
                                                        //  will contain the highest suffix in the 
                                                        //  database we'll need to add 1 to this to 
                                                        //  get the result url

    // If $baseurl is still true, then we never got into the 3-condition block above, so we never 
    //  a matching record in the database -> return the cleanurl that was passed here, no need
    //  to add a suffix
        return $cleanurl;
    // At least one database record exists, so we need to add a suffix.  The suffix we add will be
    //  the higgest we found in the database plus 1.
        return sprintf("%s-%d", $cleanurl, ($suffix + 1));

My solution takes advantage of SQL wildcards ( % ) to reduce the number of queries from n down to 1.

Make sure that you ensure problematic case I described in lines 14-20 works as expected. Hyphens in the machine name (or whatever it is) could do unexpected things.

I also used sprintf to format the query. Make sure you sanitize any string that is passed through as a string (eg $cleanurl ).

As @rodneyrehm points out, PHP is very flexible with what it considers a numeric string. You might consider switching out is_numeric() for ctype_digit() and see how that works.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM