简体   繁体   English

Javascript 正则表达式检查字符串是否包含重音字符

[英]Javascript Regex to check if a string contains accented characters

I am currently working on checking if a string contains accented characters.我目前正在检查字符串是否包含重音字符。 for example例如

hellohello  ---> return true
helloéèhello  ---> return false because the text contains accent characters 

can anyone help me with the regex任何人都可以帮助我使用正则表达式

thank you谢谢你

If you are trying to check for emails check out the answer that alex suggested but if you just want to check the above two test cases here it is.如果您正在尝试检查电子邮件,请查看 alex 建议的答案,但如果您只想检查上述两个测试用例,那就是这里。

Note this is not for testing valid email just some valid string请注意,这不是用于测试有效的 email 只是一些有效的字符串

^[a-zA-Z@._]*[a-zA-z0-9]$

^ Starting with [a-zA-Z0-9@._] lowercase, uppercase, digits, @ , . ^[a-zA-Z0-9@._] lowercase, uppercase, digits, @ , & _ are valid characters meaning accent and other symbols are not valid * greedy select matching previous character sets [a-zA-z0-9]$` ends with alphanumeric data. & _ are valid characters meaning accent and other symbols are not valid * greedy select matching previous character sets [a-zA-z0-9]$` 以字母数字数据结尾。

Example例子

hello@gmail.com   ---> return true
helloégmail  ---> return false because the text contains accent characters 
hello1 -> true
test@ -> false as it should end with alphanumeric character

I am not sure why you want a regex based answer.我不确定您为什么想要基于正则表达式的答案。 But if that is not absolutely necessary, then here is how you can do detect it.但是,如果这不是绝对必要的,那么您可以通过以下方式检测它。

( Disclaimer : I am not familiar with European languages that have accented alphabets, so I may have missed some linguistic aspect here. Also, I am more familiar with Java and the JavaScript here may not be optimal.) 免责声明:我不熟悉带有重音字母的欧洲语言,所以我可能在这里遗漏了一些语言方面。另外,我更熟悉 Java 和 JavaScript 可能不是最佳的。)

ASCII ASCII

If your text is ASCII, then I know no other way than looping through the character array and comparing its ASCII value to see if it is one of the accented characters.如果您的文本是 ASCII,那么除了循环遍历字符数组并比较其 ASCII 值以查看它是否是重音字符之一之外,我别无他法。 You can loop through from 1 to 255 and print the characters.您可以从1255循环并打印字符。

The accented characters, as I see, start from 192 onwards.如我所见,重音字符从192开始。 However, not all characters beyond this are, so you will have to compare against the right set.但是,并非所有超出此范围的角色都是如此,因此您必须与正确的集合进行比较。

Here is a pseudocode that shows what I mean.这是一个伪代码,说明了我的意思。 (I am not skilled at JavaScript.) (我不擅长 JavaScript。)

/* This array has to be prepared by looking at all ASCII characters till 255. */
char[] accented = new char[]{ (char) 192, (char) 193, ... };
for( let c of Array.from( 'helloéèhello' ) ){
    if( isPresentIn( c, accented ) ){
        console.log( "Accented chars present" )
        break;
    }
}

Unicode Unicode

If this is a Unicode text, there is an indirect way to do this using normalization of Unicode characters.如果这是 Unicode 文本,则可以通过使用 Unicode 字符的规范化来间接执行此操作。 In Unicode, accented characters are usually composite characters.在 Unicode 中,重音字符通常是复合字符。 So, you can decompose the character and check if it has a component greater than code point 256.因此,您可以分解字符并检查它是否具有大于代码点 256 的组件。

To understand it in detail, you may go through the description at https://www.unicode.org/reports/tr15/tr15-23.html . To understand it in detail, you may go through the description at https://www.unicode.org/reports/tr15/tr15-23.html .

This is not perfect, but will be a good guide for you to come up with a more complete design.这并不完美,但会成为您提出更完整设计的良好指南。

Decomposing in JavaScript:在 JavaScript 中分解:

'helloéèhello'.normalize( 'NFD' )

Eg, é decomposes into e and code point 768, è decomposes into e and code point 769.例如, é分解为e和代码点 768, è分解为e和代码点 769。

Note the difference in the characters without and after normalization.注意没有和标准化后字符的差异。

Array.from( 'helloéèhello'.normalize( 'NFD' ) )
(14) ['h', 'e', 'l', 'l', 'o', 'e', '́', 'e', '̀', 'h', 'e', 'l', 'l', 'o']

Array.from( 'helloéèhello' )
(12) ['h', 'e', 'l', 'l', 'o', 'é', 'è', 'h', 'e', 'l', 'l', 'o']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 正则表达式检查字符串是否仅包含字母数字字符和空格 - javascript - Regex to check if string contains Alphanumeric Characters and Spaces only - javascript 如果字符串包含不在RegEx中的字符,则返回Javascript RegEx - Javascript RegEx to return if string contains characters that are NOT in the RegEx Javascript正则表达式,用于验证不带重音符号的名称 - Javascript regex for validating name without accented characters 正则表达式要检查的Javascript是否以字符串模式开头 - Javascript for regex to check for not contains with the starting with the string pattern javascript检查字符串是否仅包含字母数字字符+其他特殊字符 - javascript check if string contains only alphanumeric characters + other special characters 如果字符串仅包含唯一字符,则使用JavaScript进行正则表达式测试 - Regex test in JavaScript if a string contains only unique characters 用javascript替换重音字符 - Replacing accented characters with javascript javascript正则表达式:字符串包含这个,但不是那个 - javascript regex: string contains this, but not that 正则表达式模式,如果字符串包含字符 - Regex pattern if string contains characters 使用JavaScript检查字符串是否包含日文字符(包括汉字) - Using JavaScript to check whether a string contains Japanese characters (including kanji)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM