简体   繁体   English

正则表达式未在Google搜索结果中找到所有值?

[英]Regex not finding all values in Google search results?

First of all, I should stress that I'm trying to learn here, not be malicious or spam anyone. 首先,我要强调的是,我要在这里学习,而不是恶意或向任何人发送垃圾邮件。

I'm trying to learn about regex in Google search results by finding email addresses using the following code. 我正在尝试通过使用以下代码查找电子邮件地址来了解Google搜索结果中的正则表达式。 However, sometimes it only finds some of the email addresses, other times not at all. 但是,有时它只能找到一些电子邮件地址,而其他时候则根本找不到。

If I try it with a Wikipedia URL then I don't have a problem. 如果我尝试使用Wikipedia URL,那么我没有问题。

$url = "https://www.google.com/search?q=hello@hotmail.com";
// $url = "http://en.wikipedia.org/wiki/Email_address"; this works fine
$string = file_get_contents($url);

$matches = array();
$pattern = '/[a-z\d._%+-]+@[a-z\d.-]+\.[a-z]{2,4}\b/i';
preg_match_all($pattern,$string,$matches);

foreach ($matches as $row)
{
    foreach ($row as $row2)
    {
        echo $row2."<br>";
    }
}

You're missing uppercase: 您缺少大写字母:

'/[A-Za-z\d._%+-]+@[A-Za-z\d.-]+\.[A-Za-z]{2,4}\b/i'

I put it in everywhere in case you want to match HELLO@GMAIL.COM, you can always downcase it. 我把它放在任何地方,以防您想要匹配HELLO@GMAIL.COM,您可以随时将其小写。

EDIT: I think I was trying to solve this for a different email address which wasn't being matched 编辑:我想我正在尝试解决另一个不匹配的电子邮件地址

EDIT 2: search the html, those that don't work have emphasis like example<em>@example.com</em> so won't parse. 编辑2:搜索html,那些不起作用的要强调,例如example<em>@example.com</em>因此不会解析。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM