I am still a novice with regular expressions, "regex", etc... in Java.
If I have an url like this : " http://somedomain.someextention/somefolder/.../someotherfolder/somepage "
What is the simplest way to get :
Thanks !
You don't have to (and probably shouldn't) use regex here. Instead use classes defined to handle things like this. You can use for example URL
, URI
, File
classes like
String address = "http://somedomain.someextention/somefolder/.../someotherfolder/somepage";
URL url = new URL(address);
File file = new File(url.getPath());
System.out.println(url.getHost());
System.out.println(url.getPath());
System.out.println(file.getName());
Outpit:
somedomain.someextention
/somefolder/.../someotherfolder/somepage
somepage
Now you can need to get rid of /
at start of path to your resource. You can use substring(1)
here if resource starts with /
.
But if you really must use regex you can try with
^https?://([^/]+)/(.*/([^/]+))$
Now
The best way to get those components is to use the URI
class; eg
URI uri = new URI(str);
String domain = uri.getHost();
String path = uri.getPath();
int pos = path.lastIndex("/");
...
// or use File to parse the path string.
You could do it using regexes on the raw url string, but there is a risk that you won't correctly cope with all of the variability that is possible in a URL. (Hint: the regex supplied by @Pchenko doesn't :-)) And you would definitely need to use a decoder to deal with possible percent encoding.
This is not a regexp or URI use but simple substring code as an excersise material. Missing few corner case format validation.
int lastDelim = str.lastIndexOf('/);
if (lastDelim<0) throw new IllegalArgumentException("Invalid url");
int startIdx = str.indexOf("//");
startIdx = startIdx<0 ? 0 : startIdx+2;
int pathDelim = str.indexOf('/', startIdx);
String domain = str.substring(startIdx, pathDelim);
String path = str.substring(pathDelim+1, lastDelim);
String page = str.substring(lastDelim+1);
If you would like to use regex to decode the URL instead of using the URI class, as described in the previous answers, the below link gives a nice tutorial of regex, and it explains decoding a sample URL as well. You could learn it there and try it out.
It's not regex, or scalable at that, it works though:
public class SomeClass
{
public static void main(String[] args)
{
SomeClass sclass = new SomeClass();
String[] string =
sclass.parseURL("http://somedomain.someextention/somefolder/.../someotherfolder/somepage");
System.out.println(string[0]);
System.out.println(string[1]);
System.out.println(string[2]);
}
private String[] parseURL(String url)
{
String part1 = url.substring("http://".length(), url.indexOf("/", "http://".length()));
String part2 = url.substring("http://".length() + part1.length() + 1, url.lastIndexOf("/"));
String part3 = url = url.substring(url.lastIndexOf("/") + 1);
return new String[] { part1, part2, part3 };
}
}
Output:
somedomain.someextention
somefolder/.../someotherfolder
somepage
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.