简体   繁体   English

如何在Javascript和PHP中验证非英语(UTF-8)编码的电子邮件地址?

[英]How to validate non-english (UTF-8) encoded email address in Javascript and PHP?

Part of a website I am currently working on contains registration process where users have to provide their email address. 我目前正在处理的网站的一部分包含注册过程,用户必须提供他们的电子邮件地址。 Just recently I became aware that non-ascii based domains are possible (so is email). 就在最近,我意识到基于非ascii的域是可能的(电子邮件也是如此)。 My backend is utf-8 encoded MySQL where I am expecting any users (with differnt locales) should be able to enter their email but don't know how to validate this kind of email address. 我的后端是utf-8编码的MySQL,我期待任何用户(具有不同的语言环境)应该能够输入他们的电子邮件,但不知道如何验证这种电子邮件地址。

Currently I am testing out jquery tools and it validates the english email address correctly but fails to validate non ascii email. 目前我正在测试jquery工具,它正确验证了英文电子邮件地址,但未能验证非ascii电子邮件。 Also I need to do same at server side with php. 另外我需要在服务器端用php做同样的事情。 Is there a regular expression that can validate this kind of email address? 是否有正则表达式可以验证此类电子邮件地址?

I have tried this but it fails in jquery tools (this is just example for demo, I don't understand this too) 我试过这个,但它在jquery工具中失败了(这只是演示的例子,我也不明白)

闪闪发光@闪闪发光.com 闪闪发光@闪闪发光.COM

Also what will happen when they type their English email address (jonesmith@somemail.com) with their own IME. 当他们用他们自己的IME键入他们的英文电子邮件地址(jonesmith@somemail.com)时会发生什么。 Can this be validated with current regular expression we have for English mail validation. 这可以通过我们用于英文邮件验证的当前正则表达式进行验证。 Currently I don't have to worry if that email exist for not. 目前我不必担心该电子邮件是否存在。

Thanks 谢谢

Attempting to validate email addresses may not be a good idea. 尝试验证电子邮件地址可能不是一个好主意。 The specifications ( RFC5321 , RFC5322 ) allow for so much flexibility that validating them with regular expressions is literally impossible , and validating with a function is a great deal of work. 规范( RFC5321RFC5322 )允许这么大的灵活性,用正则表达式验证它们实际上是不可能的 ,并且使用函数进行验证是一项很大的工作。 The result of this is that most email validation schemes end up rejecting a large number of valid email addresses, much to the inconvenience of the users. 结果是大多数电子邮件验证方案最终拒绝大量有效的电子邮件地址,这很大程度上是因为用户的不便。 (By far the most common example of this is not allowing the + character.) (到目前为止,最常见的例子是不允许+字符。)

It is more likely that the user will (accidentally or deliberately) enter an incorrect email address than in an invalid one, so actually validating is a great deal of work for very little benefit, with possible costs if you do it incorrectly. 用户更可能(意外地或故意地)输入错误的电子邮件地址而不是无效的电子邮件地址,因此实际验证是非常有益的大量工作,如果你做错了可能会花费成本。

I would recommend that you just check for the presence of an @ character on the client and then send a confirmation email to verify it; 我建议您只检查客户端是否存在@字符,然后发送确认电子邮件进行验证; it's the most practical way to validate and it confirms that the address is correct as well. 这是最实用的验证方式,它确认地址也是正确的。

Since 5.2 PHP has a build in validation for email addresses . 由于5.2 PHP已经建立了电子邮件地址验证 But I'm not sure if it works for UFT-8 encoded strings: 但我不确定它是否适用于UFT-8编码的字符串:

echo filter_var($email, FILTER_VALIDATE_EMAIL);

In the original PHP source code you will find the reg exp for validating email, this can be used for manually validating when using PHP < 5.2. 原始PHP源代码中,您将找到用于验证电子邮件的reg exp,这可用于在使用PHP <5.2时手动验证。

Update 更新

idn_to_ascii() can be used to "Convert domain name to IDNA ASCII form." idn_to_ascii()可用于“将域名转换为IDNA ASCII格式”。 Which then can be validated with filter_var($email, FILTER_VALIDATE_EMAIL); 然后可以使用filter_var($email, FILTER_VALIDATE_EMAIL);验证filter_var($email, FILTER_VALIDATE_EMAIL);

// International domains
if (function_exists('idn_to_ascii') && strpos($email, '@') !== false) {
    $parts = explode('@', $email);
    $email = $parts[0].'@'.idn_to_ascii($parts[1]);
}
$is_valid = filter_var($email, FILTER_VALIDATE_EMAIL);

As offered by Mario , playing around a bit, I came up with the following regex to validate non-standard email address: Mario提供,玩了一下,我想出了以下正则表达式来验证非标准电子邮件地址:

^([\p{L}\_\.\-\d]+)@([\p{L}\-\.\d]+)((\.(\p{L}){2,63})+)$

It would validate any proper email address with all kind of Unicode letters, with TLD limitations from 2 to 63 characters. 它将使用所有类型的Unicode字母验证任何正确的电子邮件地址,TLD限制为2到63个字符。

Please check it and let me know if there are any flaws. 请检查一下,如果有任何缺陷,请告诉我。

Example Online 在线示例

reg exp可能是这样的:

[^ ]+@[^ ]+\.[^ ]{2,6}

Got this idea from Javascript tutorial page . Javascript教程页面得到了这个想法。 It is basic but it works for me without worrying about complexity of regular expressions and unicode standards. 它是基本的,但它对我有用而不用担心正则表达式和unicode标准的复杂性。

Client side validation 客户端验证

if(!$.trim(value).length) {
    return false;
}
else {

    AtPos = value.indexOf("@");
    StopPos = value.lastIndexOf(".");

    if (AtPos == -1 || StopPos == -1) {
        return false;
    }

    if (StopPos < AtPos) {
        return false;
    }

    if (StopPos - AtPos == 1) {
        return false;
    }

    return true;
}

Serverside validation 服务器端验证

if(!isset($_POST['emailaddr']) || trim($_POST['emailaddr']) == "") {
    //Error: Email required
}
else {
    $atpos = strpos($_POST['emailaddr'],'@');
    $stoppos = strpos($_POST['emailaddr'],'.');

    if(($atpos === false) || ($stoppos === false)) {
        //Error: invalid email
    }
    else {
        if($stoppos < $atpos) {
            //Error: invalid email
        }
        else {
            if (($stoppos-$atpos) == 1) {
            //Error: invalid email
        }
    }
}

Though it still has some loop holes, I guess users will not be fooling around with this stuff. 虽然它仍然有一些循环漏洞,但我想用户不会愚弄这些东西。 Also real validation is requierd for serious stuff as suggested by 'Jeremy Banks'. 正如“杰里米·班克斯”(Jeremy Banks)所建议的那样,对于严肃的东西也需要真正的验证。

Hope this will be helpful for somebody else too. 希望这对其他人也有帮助。

Thanks and regards to all 感谢和问候所有人

On this subject I liked this page so much that I set up a blog exposing sites that do validation wrong (contributions gratefully received - don't let yours be on it!). 在这个主题上,我非常喜欢这个页面 ,以至于我设置了一个博客,揭示了验证错误的网站 (感谢收到的贡献 - 不要让你的内容!)。

As far as using regexes go, those that say "it's wrong", tend to be light on alternatives, and TBH validation to the last letter of the RFC isn't really that critical - for example while noddy+!#$%&'*-/=?+_{}|~test@gmail.com is a perfectly valid address, it's not too unreasonable to reject it given that a surprisingly large proportion of users can't even type 'hotmail' correctly. 至于使用正则表达式,那些说“它是错误的”,往往会对替代品有所了解,并且对RFC的最后一个字母的TBH验证并不是那么重要 - 例如,当noddy+!#$%&'*-/=?+_{}|~test@gmail.com是一个完全有效的地址,拒绝它是不合理的,因为很多用户甚至无法正确输入“hotmail”。 Some domains are also quite restrictive on user names anyway, particularly hotmail. 有些域对用户名也有相当严格的限制,特别是hotmail。 So I'm in favour of regexes that are demonstrably reasonable, and my favourite source for that is this page , though I don't like their current JS 'winner' and it would help if they set up a public test page. 所以我赞成合理的正则表达式,我最喜欢的就是这个页面 ,虽然我不喜欢他们当前的JS'胜利者',如果他们建立一个公共测试页面会有所帮助。

jQuery's validate plugin uses this regex which is interestingly constructed, quite similar in style (but smaller!) to the ex-parrot one (actually my ISP!) linked by @powtac . jQuery的validate插件使用了这个有趣构造的正则表达式 ,它的样式非常相似(但更小!)与@powtac链接的前鹦鹉(实际上是我的ISP!)。

what is about something this: 这是怎么回事:

mb_internal_encoding("UTF-8");
mb_regex_encoding("UTF-8");
mb_ereg('[\w]+@[\w]+\.com',$mail,'UTF-8');

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM