简体   繁体   中英

PHP (preg_replace) regex strip image sizes from filename

I'm working on a open-source plugin for WordPress and frankly facing an odd issue.

Consider the following filenames:

/wp-content/uploads/buddha_-800x600-2-800x600.jpg
/wp-content/uploads/cutlery-tray-800x600-2-800x600.jpeg
/wp-content/uploads/custommade-wallet-800x600-2-800x600.jpeg
/wp-content/uploads/UI-paths-800x800-1.jpg

The current regex I have:

(-[0-9]{1,4}x[0-9]{1,4}){1}

This will remove both matches from the filename, for example buddha_-800x600-2-800x600.jpg will become buddha_-2.jpg which is invalid.

I have tried a variety of regex:

.*(-\d{1,4}x\d{1,4}) // will trip out everything
(-\d{1,4}x\d{1,4}){1}|.*(-\d{1,4}x\d{1,4}){1} // same as above
(-\d{1,4}x\d{1,4}){1}|(-\d{1,4}x\d{1,4}){1} // will strip out all size matches

Unfortunately my knowledge with regex is quite limited, can someone advise how to achieve the goal please?

The goal is to remove only what is relevant, which would result in:

/wp-content/uploads/buddha_-800x600-2.jpg
/wp-content/uploads/cutlery-tray-800x600-2.jpeg
/wp-content/uploads/custommade-wallet-800x600-2.jpeg
/wp-content/uploads/UI-paths-1.jpg

Much appreciated!

You can use a capture group with a backreference to match strings where there are 2 of the same parts and replace that with a single part.

Or match the dimensions to be removed.

((-\d+x\d+)-\d+)\2|-\d+x\d+
  • ( Capture group 1
    • (-\d+x\d+) Capture group 2 , match - 1+ digits x and 1+ digits
    • -\d+ Match - and 1+ digits
  • )\2 Close group 2 followed by a backreference to what is captured in grouip 1
  • | Or
  • -\d+x\d+ Match the dimensions format

Regex demo | Php demo

For example

$pattern = '~((-\d+x\d+)-\d+)\2|-\d+x\d+~';
$strings = [
    "/wp-content/uploads/buddha_-800x600-2-800x600.jpg",
    "/wp-content/uploads/cutlery-tray-800x600-2-800x600.jpeg",
    "/wp-content/uploads/custommade-wallet-800x600-2-800x600.jpeg",
    "/wp-content/uploads/UI-paths-800x800-1.jpg",
];

foreach ($strings as $s) {
    echo  preg_replace($pattern, '$1', $s) . PHP_EOL;
}

Output

/wp-content/uploads/buddha_-800x600-2.jpg
/wp-content/uploads/cutlery-tray-800x600-2.jpeg
/wp-content/uploads/custommade-wallet-800x600-2.jpeg
/wp-content/uploads/UI-paths-1.jpg

I would try something like this. You can test it yourself. Here is the code:

$a = [
     '/wp-content/uploads/buddha_-800x600-2-800x600.jpg',
     '/wp-content/uploads/cutlery-tray-800x600-2-800x600.jpeg',
     '/wp-content/uploads/custommade-wallet-800x600-2-800x600.jpeg',
     '/wp-content/uploads/UI-paths-800x800-1.jpg'
];
            
foreach($a as $img) 
    echo preg_replace('#-\d+x\d+((-\d+|)\.[a-z]{3,4})#i', '$1', $img).'<br>';

It checks for ending -(number)x(number)(dot)(extension)

This is a clear case of « Match the rejection, revert the match ». So, you just have to think about the pattern you are searching to remove:

[0-9]+x[0-9]+

which is simply (much condensed):

\d+x\d+

The next step is to build the groups extractor:

^(.*[^0-9])[0-9]+x[0-9]+([^x]*\.[a-z]+)$

We added the extension of the file as a suffix for the extract. The rejection of the "x" char is a (bad…) trick to ensure the match of the last size only. It won't work in the case of an alphanumeric suffix between the size and the extension ( toto-800x1024-ex.jpg for instance).

And then, the replacement string:

$1$2

For clarity of course, we are only working on a successfully extracted filename. But if you want to treat the whole string, the pattern becames:

^/(.*[^0-9])[0-9]+x[0-9]+([^/x]*\.[a-z]+)$

If you want to split the filename and the folder name:

^/(.*/)([^/]+[^0-9])[0-9]+x[0-9]+([^/x]*)(\.[a-z]+)$
^/(.*/)([^/]+\D)\d+x\d+([^/x]*)(\.[a-z]+)$
$folder=$1;
$filename="$1$2";

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM