简体   繁体   中英

Regex to remove trailing string in .htaccess RewriteRule

My web server is appending unwanted extra characters to the ends of URLs and I would like to remove these. My current RewriteRule is...

RewriteRule ([0-9]{4})\/([0-9]{2})\/([0-9]{2})\/(.*) https://example.com/$4 [R=301,L]

This takes http://example.com/2021/06/19/page-name and converts it to http://example.com/page-name .

The problem with this rule is the wildcard, anything after page-name is also included, such as https://example.com/page-name/%s . How can I modify this rule to omit anything after $4 ?

You may use this rule:

RewriteRule ^\d{4})(?:/\d{2}){2}/([\w-]+).* /$1 [R=301,L]

There is no need to capture anything that you don't need so removed capture groups of date fields.

Also [\\w-]+ will match 1+ of word or hyphen characters to match a page-name and that is the only thing you want to capture, to be used in the redirect target on RHS.

If you want to include more characters in page-name then consider:

RewriteRule ^\d{4})(?:/\d{2}){2}/([^/]+)/.* /$1 [R=301,L]

Where [^/]+ will match 1+ of any character that is not /

With your shown samples, please try following htaccess Rule. Make sure to clear your browser cache before testing your URLs.

RewriteRule ^\d{4}/\d{2}/\d{2}/([^/]*).*/?$ https://example.com/$1 [R=301,L]

Explanation: You need not to create 4 capturing groups here, just match from starting of url 4 digits/2 digits/2 digits and then capture everything till next occurrence of / in 1st capturing group, which could be used in redirection part.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM