简体   繁体   English

正则表达式-匹配字符,但不包含在结果中

[英]Regex - Match characters but don't include within results

I have got the following Regex, which ALMOST works... 我有以下正则表达式,ALMOST可以工作...

(?:^https?:\/\/)(?:www|[a-z]+)\.([^.]+)

I need the result to be the only result, or within the same position in the Array. 我需要结果是唯一的结果,或者在数组中的相同位置。

So for example this http://m.facebook.com/ matches perfect, there is only 1 group. 因此,例如,此http://m.facebook.com/匹配完美,只有1组。

However, if I change it to http://facebook.com/ then I get com/ in place of where Facebook should be. 但是,如果我将其更改为http://facebook.com/则将com/替换为Facebook应该位于的位置。 So I need to have (?:www|[az]+) as an optional check really. 因此,我确实需要(?:www|[az]+)作为可选检查。

Edit: 编辑:

What I expect is just to match facebook , if ANY of the strings are as follows: 我期望的只是匹配facebook ,如果任何字符串如下:

http://www.facebook.com http://www.facebook.com

http://facebook.com http://facebook.com

http://m.facebook.com http://m.facebook.com

And obviously the https counterparts. 显然与https对应。

This is my Regex now 这是我的正则表达式

(?:^https?:\/\/)(?:www)?\.?([^.]+)

This is close, however it matches the m on when I try ` http://m.facebook.com 这很近,但是当我尝试` http://m.facebook.com时,它与m匹配

https://regex101.com/r/GDapY5/1 https://regex101.com/r/GDapY5/1

So I need to have (?:www|[az]+) as an optional check really. 因此,我确实需要(?:www | [az] +)作为可选检查。

A ? 一个? at the end of a pattern is generally used for "optional" bits -- it means "match zero or one" of that thing, so your subpattern would be something like this: 模式结尾处通常用于“可选”位-表示该东西“匹配零或一”,因此您的子模式将如下所示:

(?:www|[a-z]+)?

If you're simply trying to get the second level domain, I wouldn't bother with regex, because you'll be constantly adjusting it to handle special cases you come across. 如果您只是想获得二级域名,那么我不会理会regex,因为您将不断对其进行调整以处理遇到的特殊情况。 Just split on dots and take the penultimate value: 只需将点分开并获得倒数第二个值:

$domain = array_reverse(explode('.', parse_url($str)['host']))[1];

Or: 要么:

$domain = array_reverse(explode('.', parse_url($str, PHP_URL_HOST)))[1];

Perhaps you could make the first m. 也许你可以开第一个m. part optional with (?:\\w+\\.)? (?:\\w+\\.)?可选的部分(?:\\w+\\.)? . Instead of a capturing group you could use \\K to reset the starting point of the reported match. 除了捕获组,您可以使用\\K重置所报告比赛的起点。

Then match one or more word characters \\w+ and use a positive lookahead to assert that what follows is a dot (?=\\.) 然后匹配一个或多个单词字符\\w+并使用正向先行断言其后是一个点(?=\\.)

For example: 例如:

^https?://(?:www)?(?:\\w+\\.)?\\K\\w+(?=\\.)

Edit: Or you could match for m. 编辑:或者您可以匹配m. or www. www. using an alternation: 使用交替:

^https?://(?:m\\.|www\\.)?\\K\\w+(?=\\.)

Demo Php 演示版

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM