简体   繁体   English

如何使用正则表达式来验证中文输入?

[英]How to use regular expression to validate Chinese input?

The thing is I need to treat this kind of Chinese input as invalid in client side validation: 问题是我需要在客户端验证中将此类中文输入视为无效:

Input is invalid when any English character mixed with any Chinese character and spaces has a total length >=10. 任何与任何中文字符和空格混合的英文字符总长度> = 10时,输入无效。

Let's say : "你的a你的a你的a你" or "你的 你的 你的 你" (length is 10) is invalid. 让我们说:“你的是你的你的你”或“你的你的你的你”(长度为10)无效。 But "你的a你的a你的a" (length is 9) is OK. 但是“你的是你的一个人”(长度为9)是可以的。

I am using both Javascript to do client side validation and Java to do the server side. 我使用Javascript进行客户端验证,使用Java进行服务器端。 So I suppose applying the regular expression on both should be perfect. 所以我认为在两者上应用正则表达式应该是完美的。

Can anyone give some hints how to write the rules in regular expression? 任何人都可以提供一些提示如何在正则表达式中编写规则?

From What's the complete range for Chinese characters in Unicode? 来自Unicode的中文字符的完整范围是什么? , the CJK unicode ranges are: ,CJK unicode范围是:

Block                                   Range       Comment
--------------------------------------- ----------- ----------------------------------------------------
CJK Unified Ideographs                  4E00-9FFF   Common
CJK Unified Ideographs Extension A      3400-4DBF   Rare
CJK Unified Ideographs Extension B      20000-2A6DF Rare, historic
CJK Unified Ideographs Extension C      2A700–2B73F Rare, historic
CJK Unified Ideographs Extension D      2B740–2B81F Uncommon, some in current use
CJK Unified Ideographs Extension E      2B820–2CEAF Rare, historic
CJK Compatibility Ideographs            F900-FAFF   Duplicates, unifiable variants, corporate characters
CJK Compatibility Ideographs Supplement 2F800-2FA1F Unifiable variants
CJK Symbols and Punctuation             3000-303F

You probably want to allow code points from the Unicode blocks CJK Unified Ideographs and CJK Unified Ideographs Extension A . 您可能希望允许来自Unicode块CJK Unified IdeographsCJK Unified Ideographs Extension A的代码点。

This regex will match 0 to 9 spaces, ideographic spaces (U+3000), AZ letters, or code points in those 2 CJK blocks. 此正则表达式将匹配0到9个空格,表意空间(U + 3000),AZ字母或这两个CJK块中的代码点。

/^[ A-Za-z\u3000\u3400-\u4DBF\u4E00-\u9FFF]{0,9}$/

The ideographs are listed in: 表意文字列于:

However, you may as well add more blocks. 但是,您可以添加更多块。


Code: 码:

 function has10OrLessCJK(text) { return /^[ A-Za-z\ \㐀-\䶿\一-\鿿]{0,9}$/.test(text); } function checkValidation(value) { var valid = document.getElementById("valid"); if (has10OrLessCJK(value)) { valid.innerText = "Valid"; } else { valid.innerText = "Invalid"; } } 
 <input type="text" style="width:100%" oninput="checkValidation(this.value)" value="你的a你的a你的a"> <div id="valid"> Valid </div> 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM