简体   繁体   中英

Regex - Parse string by removing characters

I am trying to parse the path part of a url.

The input, is a string such as site/whatever% ^&*/page/to-days_date// which I would like to convert into site/whatever/page/to-days_date

Things to remove would be anything that is not one of the following:

  1. lower or upper case letter
  2. digit / number
  3. dash
  4. underscore

Just add /+$ with a pipe( | ) with your existing regex. It means match any number(starting from 1) of / from the end of input. So it will work for / // or ///// at the end of the input.

myString = '''blog/whatever%  ^&*/page/to-days_date//'''
print re.sub(r'/+$|[^a-zA-Z0-9_\-\/]+', '', myString)
               ^^^ here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM