[英]Javascript Regex to check if a string contains accented characters
I am currently working on checking if a string contains accented characters.我目前正在检查字符串是否包含重音字符。 for example
例如
hellohello ---> return true
helloéèhello ---> return false because the text contains accent characters
can anyone help me with the regex任何人都可以帮助我使用正则表达式
thank you谢谢你
If you are trying to check for emails check out the answer that alex suggested but if you just want to check the above two test cases here it is.如果您正在尝试检查电子邮件,请查看 alex 建议的答案,但如果您只想检查上述两个测试用例,那就是这里。
Note this is not for testing valid email just some valid string请注意,这不是用于测试有效的 email 只是一些有效的字符串
^[a-zA-Z@._]*[a-zA-z0-9]$
^
Starting with [a-zA-Z0-9@._] lowercase, uppercase, digits,
@ ,
. ^
以[a-zA-Z0-9@._] lowercase, uppercase, digits,
@ ,
&
_ are valid characters meaning accent and other symbols are not valid
* greedy select matching previous character sets
[a-zA-z0-9]$` ends with alphanumeric data. &
_ are valid characters meaning accent and other symbols are not valid
* greedy select matching previous character sets
[a-zA-z0-9]$` 以字母数字数据结尾。
Example例子
hello@gmail.com ---> return true
helloégmail ---> return false because the text contains accent characters
hello1 -> true
test@ -> false as it should end with alphanumeric character
I am not sure why you want a regex based answer.我不确定您为什么想要基于正则表达式的答案。 But if that is not absolutely necessary, then here is how you can do detect it.
但是,如果这不是绝对必要的,那么您可以通过以下方式检测它。
( Disclaimer : I am not familiar with European languages that have accented alphabets, so I may have missed some linguistic aspect here. Also, I am more familiar with Java and the JavaScript here may not be optimal.) (免责声明:我不熟悉带有重音字母的欧洲语言,所以我可能在这里遗漏了一些语言方面。另外,我更熟悉 Java 和 JavaScript 可能不是最佳的。)
If your text is ASCII, then I know no other way than looping through the character array and comparing its ASCII value to see if it is one of the accented characters.如果您的文本是 ASCII,那么除了循环遍历字符数组并比较其 ASCII 值以查看它是否是重音字符之一之外,我别无他法。 You can loop through from
1
to 255
and print the characters.您可以从
1
到255
循环并打印字符。
The accented characters, as I see, start from 192
onwards.如我所见,重音字符从
192
开始。 However, not all characters beyond this are, so you will have to compare against the right set.但是,并非所有超出此范围的角色都是如此,因此您必须与正确的集合进行比较。
Here is a pseudocode that shows what I mean.这是一个伪代码,说明了我的意思。 (I am not skilled at JavaScript.)
(我不擅长 JavaScript。)
/* This array has to be prepared by looking at all ASCII characters till 255. */
char[] accented = new char[]{ (char) 192, (char) 193, ... };
for( let c of Array.from( 'helloéèhello' ) ){
if( isPresentIn( c, accented ) ){
console.log( "Accented chars present" )
break;
}
}
If this is a Unicode text, there is an indirect way to do this using normalization of Unicode characters.如果这是 Unicode 文本,则可以通过使用 Unicode 字符的规范化来间接执行此操作。 In Unicode, accented characters are usually composite characters.
在 Unicode 中,重音字符通常是复合字符。 So, you can decompose the character and check if it has a component greater than code point 256.
因此,您可以分解字符并检查它是否具有大于代码点 256 的组件。
To understand it in detail, you may go through the description at https://www.unicode.org/reports/tr15/tr15-23.html . To understand it in detail, you may go through the description at https://www.unicode.org/reports/tr15/tr15-23.html .
This is not perfect, but will be a good guide for you to come up with a more complete design.这并不完美,但会成为您提出更完整设计的良好指南。
Decomposing in JavaScript:在 JavaScript 中分解:
'helloéèhello'.normalize( 'NFD' )
Eg, é
decomposes into e
and code point 768, è
decomposes into e
and code point 769.例如,
é
分解为e
和代码点 768, è
分解为e
和代码点 769。
Note the difference in the characters without and after normalization.注意没有和标准化后字符的差异。
Array.from( 'helloéèhello'.normalize( 'NFD' ) )
(14) ['h', 'e', 'l', 'l', 'o', 'e', '́', 'e', '̀', 'h', 'e', 'l', 'l', 'o']
Array.from( 'helloéèhello' )
(12) ['h', 'e', 'l', 'l', 'o', 'é', 'è', 'h', 'e', 'l', 'l', 'o']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.