My web server is appending unwanted extra characters to the ends of URLs and I would like to remove these. My current RewriteRule is...
RewriteRule ([0-9]{4})\/([0-9]{2})\/([0-9]{2})\/(.*) https://example.com/$4 [R=301,L]
This takes http://example.com/2021/06/19/page-name
and converts it to http://example.com/page-name
.
The problem with this rule is the wildcard, anything after page-name is also included, such as https://example.com/page-name/%s
. How can I modify this rule to omit anything after $4
?
You may use this rule:
RewriteRule ^\d{4})(?:/\d{2}){2}/([\w-]+).* /$1 [R=301,L]
There is no need to capture anything that you don't need so removed capture groups of date fields.
Also [\\w-]+
will match 1+ of word or hyphen characters to match a page-name and that is the only thing you want to capture, to be used in the redirect target on RHS.
If you want to include more characters in page-name then consider:
RewriteRule ^\d{4})(?:/\d{2}){2}/([^/]+)/.* /$1 [R=301,L]
Where [^/]+
will match 1+ of any character that is not /
With your shown samples, please try following htaccess Rule. Make sure to clear your browser cache before testing your URLs.
RewriteRule ^\d{4}/\d{2}/\d{2}/([^/]*).*/?$ https://example.com/$1 [R=301,L]
Explanation: You need not to create 4 capturing groups here, just match from starting of url 4 digits/2 digits/2 digits and then capture everything till next occurrence of /
in 1st capturing group, which could be used in redirection part.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.