I am trying to split URLs to get the domain name.
example.com => example.com
example.com/dir/index.html => example.com
The regular expression I am trying to us is
(.+?)(/|$)
When I use it in python like this:
import re
m = re.search('(.+?)(/|$)', url)
It works for the first one, but for the second example I always get example.com/
. How do I get rid of the backslash?
edit: I am very sorry, I forgot to include one important information. I need a regular expression, because I need to write this in Oracle SQL. Fortunately, Oracle supports regex, but nothing like urlparse
. I am just using python for testing. Sorry about that!
The easy way to do this is to use the urlparse
function in the stdlib:
>>> from urllib.parse import urlparse
>>> url = 'http://example.com/dir/index.html'
>>> p = urlparse(url)
>>> p.netloc
'example.com'
Besides being a whole lot simpler, it handles cases that you haven't thought of in a well-defined and clearly-documented way (eg, what if there's a port as well as a host?), whereas with your code, who knows what happens with any cases you didn't anticipate?
If you really want to treat the URL as a string instead of a URL, the easy way to split on slashes is to split on slashes:
>>> bits = url.split('/')
>>> bits[2]
example.com
If you really want to use regexps to split on slashes, you could use re.split
instead of trying to figure out a way to trick re.search
into splitting for you:
>>> bits = re.split('/', url)
>>> bits[2]
example.com
Finally, if you want to do it with match
or search
, and you don't want to capture the /
, don't put the /
in a capturing group, and look at the group you went out of your way to capture instead of at the whole string:
>>> url = 'example.com/dir/index.html'
>>> m = re.search('(.+?)(/|$)', url)
>>> m.groups()
('example.com', '/')
>>> m = re.search('(.+?)(?:/|$)', url)
>>> m.groups()
('example.com',)
尝试匹配非froward斜杠,如([^/]+?)(/|$)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.