I need to clean up some urls to remove the unique tracking codes so that in reporting they can be counted in a group rather than 1000's of individual pages.
the code to remove is in the middle of the url and varies in length.
example url is
https://www.website.co.uk/product/?commcodeABBB/home-page/
I am trying to get this
https://www.website.co.uk/product/home-page/
I have similar code working for removing the end of a url string:
df["URL"] = df["URL"].str.replace('\/id.*','/',regex=True)
I have tried to modify it for my new scenario.
df["URL"] = df["URL"].str.replace('\/\?commcode.{0,5}','/',regex=True)
In this scenario the regex \\/\\?commcode.{0,5}
does select ?commcodeABBB/ however the length of code string in my URLs vary so it won't work on everything.
I cannot work out how to write it so that it takes everything from ?commcode up to and including the next /. I looked at \\w \\W for 'in-between' however it doesn't recognise / only alphanumeric characters.
I have read many many other posts about similar issues but nothing quite addresses this that I can find. I cannot use code that counts from start or end of the string as length changes, as does the number of / in the url so I cannot use 'between 2nd and 3rd / method.
Any ideas please?
Use
df["URL"] = df["URL"].str.replace(r'/\?commcode[^/]*', '')
See proof .
Explanation
--------------------------------------------------------------------------------
/ '/'
--------------------------------------------------------------------------------
\? '?'
--------------------------------------------------------------------------------
commcode 'commcode'
--------------------------------------------------------------------------------
[^/]* any character except: '/' (0 or more times
(matching the most amount possible))
You can do:
'\/\?commcode[A-Za-z0-9]*'
to specify which character groups you want included.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.