I have a bunch of rawr contents in database.
some containing string http://www.example.com/subfolder/name.pdf
or /subfolder/name.pdf
I need a pattern replace on these to turn them into /wp-content/uploads/old/subfolder/name.pdf
there can be many levels of subfolders! /subfolder1/subfolder2/subfolder3/file.pdf
The pattern for finding I use is
/http[^\s]+pdf/
/href="\/[^\s]+pdf/
But how to replace the pattern with another pattern? ( the example above ^ )
I have
search for /http:\/\/www.example.com(.*).pdf"/
replace with /wp-content/uploads/old$1.pdf"
search for /href="\/pdf(.*)\.pdf">/
this works fine until there are more than 1 pdf links in one table cell
example
<a href="/pdf/subdir/name.pdf">clickhere</a><a href="/pdf/subdir/name.pdf">2nd PDF</a>
this works fine until there are more than 1 pdf links in one table cell
The regex engine is greedy by default, and it consumes as much as it can attempting a match. In order to reverse this behaviour, you could use a lazy quantifier , as explained in this post: Greedy vs. Reluctant vs. Possessive Quantifiers . So you have to add an extra ?
after a quantifier to attempt a match with as less as it can consume. To make your greedy construct lazy, use [^\\s]+?
.
some containing string
http://www.example.com/subfolder/name.pdf
or/subfolder/name.pdf
But how to replace the pattern with another pattern?
As you can see, " http://www.example.com
" is optional. You can make a part of your pattern optional with a (?:group)
and a ?
quantifier.
Pattern with an optional group:
(?:http://www\.example\.com)?/(\S+?)\.pdf
\\S
(capital "S") instead of [^\\s]
(they are both exactly the same). One more thing, you may consider adding some boundaries in your pattern. I suggest using (?<!\\w)
(not preceded by a word character) and \\b
a word boundary to avoid a match as part of another word (as I commented in your question).
(?<!\w)(?:http://www\.example\.com)?/(\S+?)\.pdf\b
$re = "@(?<!\\w)(?:http://www\\.example\\.com)?/(\\S+?)\\.pdf\\b@i";
$str = "some containing string http://www.example.com/subfolder/name.pdf
or /subfolder/name.pdf
<a href=\"/pdf/subdir/name.pdf\">clickhere</a>
<a href=\"/pdf/subdir/name.pdf\">2nd PDF</a>";
$subst = "/wp-content/uploads/old/$1.pdf";
$result = preg_replace($re, $subst, $str);
A sandbox example here: http://sandbox.onlinephpfunctions.com/code/cc47b98d16981b786cf2d573751b6a09a9725b90
$array = [
"https://test.com/url/subfolder/subfolder/file.pdf",
"https://test.com/url/subfolder1/subfolder/file.pdf",
"/url/subfolder3/subfolder3/files.xml",
"/url/subfolder/subfolder/file.pdf"
];
function setwpUrl($urls, $prepend) {
for($i = 0; $i < count($urls); $i++) {
preg_match_all("/(https?:\/\/[a-zA-Z0-9\.\-]+)?(.*)/", $urls[$i], $out);
$urls[$i] = $prepend . $out[2][0];
}
return $urls;
}
$newUrls = setwpUrl($array, "/wp-content/uploads/old");
var_dump($newUrls);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.