remove part of a url using regex

Question

This is the url:

url = "www.face.com/me/4000517004580.html?gps-id=5547572&scm=1007.19201.130907.0&scm_id=1007.19201.130907.0&scm-url=1007.19201.130907.0&pvid=56aacc48-cc78-4cb9-b176-c9acb7a0662c"

I need to remove the part after the .html , so it becomes:

"www.face.com/me/4000517004580.html"

Answer 1

You can use python's urllib to parse the url into parts and then remove the query string from the url

from urllib.parse import urlparse
url = "www.face.com/me/4000517004580.html?gps-id=5547572&scm=1007.19201.130907.0&scm_id=1007.19201.130907.0&scm-url=1007.19201.130907.0&pvid=56aacc48-cc78-4cb9-b176-c9acb7a0662c"

parse_result = urlparse(url)
url = parse_result._replace(query="").geturl()  # Remove query from url

Answer 2

Try:

url.split('.html')[0]+'.html'

result:

'www.face.com/me/4000517004580.html'

Answer 3

When you are not sure how to approach a problem, I suggest starting with some documentation. For example, you can check out the string methods and common string operations .

Scrolling through this list, you will read about the find() function:

Return the lowest index in the string where substring sub is found within the slice s[start:end]. Optional arguments start and end are interpreted as in slice notation. Return -1 if sub is not found.

So to find the "?" you can do this:

i = url.find("?")

Rather than thinking about how to remove part of the string, let's figure out how to keep the part we want. We can do this with a slice:

url = url[:i]

Answer 4

The builtin urllib library can be used here.

from urllib.parse import urljoin, urlparse

url = 'www.face.com/me/4000517004580.html?gps-id=5547572&scm=1007.19201.130907.0&scm_id=1007.19201.130907.0&scm-url=1007.19201.130907.0&pvid=56aacc48-cc78-4cb9-b176-c9acb7a0662c' 
output = urljoin(url, urlparse(url).path)

remove part of a url using regex

Question

4 answers

solution1
2 2020-07-19 21:12:12

solution2
1 ACCPTED 2020-07-19 21:03:42

solution3
1 2020-07-19 21:05:48

solution4
1 2020-07-19 22:13:34

remove part of a url using regex

Question

4 answers

solution1 2 2020-07-19 21:12:12

solution2 1 ACCPTED 2020-07-19 21:03:42

solution3 1 2020-07-19 21:05:48

solution4 1 2020-07-19 22:13:34

solution1
2 2020-07-19 21:12:12

solution2
1 ACCPTED 2020-07-19 21:03:42

solution3
1 2020-07-19 21:05:48

solution4
1 2020-07-19 22:13:34