简体   繁体   中英

Regular expression to extract from URI

I need a regular expression to extract from two types of URIs

http://example.com/path/to/page/?filter
http://example.com/path/to/?filter

Basically, in both cases I need to somehow isolate and return

/path/to

and

?filter

That is, both /path/to and filter is arbitrary. So I suppose I need 2 regular expressions for this? I am doing this in PHP but if someone could help me out with the regular expressions I can figure out the rest. Thanks for your time :)

EDIT: So just want to clearify, if for example

http://example.com/help/faq/?sort=latest

I want to get /help/faq and ?sort=latest

Another example

http://example.com/site/users/all/page/?filter=none&status=2

I want to get /site/users/all and ?filter=none&status=2 . Note that I do not want to get the page !

Using parse_url might be easier and have fewer side-effects then regex:

$querystring = parse_url($url, PHP_URL_QUERY); 
$path = parse_url($var, PHP_URL_PATH);

You could then use explode on the path to get the first two segments:

$segments = explode("/", $path);

Try this:

^http://[^/?#]+/([^/?#]+/[^/?#]+)[^?#]*\?([^#]*)

This will get you the first two URL path segments and query.

not tested but:

^https?://[^ /]+[^ ?]+.*

which should match http and https url with or without path, the second argument should match until the ? (from the ?filter for instance) and the .* any char except the \\n.

Have you considered using explode() instead ( http://nl2.php.net/manual/en/function.explode.php ) ? The task seems simple enough for it. You would need 2 calls (one for the / and one for the ?) but it should be quite simple once you did that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM