简体   繁体   English

Javascript unicode(希腊语)正则表达式

[英]Javascript unicode (greek) regular expressions

I would like to use this regular expression new RegExp("\\b"+pat+"\\b") in greek text but the "\\b" metacharacter supports only ASCII characters. 我想在希腊文中使用这个正则表达式新的RegExp(“\\ b”+ pat +“\\ b”) ,但“\\ b”元字符仅支持ASCII字符。

I tried XregExp library but i didnt manage to solve the issue. 我尝试过XregExp库,但我没有设法解决这个问题。

Any suggestions would be greatly appreciated. 任何建议将不胜感激。

I think this was helpful to your answer., 我认为这对你的回答很有帮助。

<script src="xregexp.js"></script>
<script src="xregexp-unicode-base.js"></script>
<script>
    var unicodeWord = XRegExp("^\\p{L}+$");

    unicodeWord.test("Русский"); // true
    unicodeWord.test("日本語"); // true
    unicodeWord.test("العربية"); // true
</script>

<!-- \p{L} is included in the base script, but other categories, scripts,
and blocks require token packages -->
<script src="xregexp-unicode-scripts.js"></script>
<script>
    XRegExp("^\\p{Katakana}+$").test("カタカナ"); // true
</script>

Please refer the following location : http://xregexp.com/plugins/ 请参考以下位置: http //xregexp.com/plugins/

So the answer is just, that you can not use the JavaScript native mechanisms or any library which uses those mechanisms to match words the way you want to. 所以答案就是,你不能使用JavaScript本机机制或任何使用这些机制的库来按照你想要的方式匹配单词。 As you already stated, \\b matches words. 如你所说,\\ b匹配单词。 Words must consists of word characters. 单词必须由单词字符组成。 And in JavaScript (and actually other regex implementations word characters are az, AZ, 0-9 and _ . But many other Languages just implement the \\b metacharacter in a different way JavaScript does. 在JavaScript(实际上其他正则表达式实现中,单词字符是az,AZ,0-9和_ 。但是许多其他语言只是以不同的方式实现了\\ b元字符。

The answer "JavaScript does not support Unicode" is a bit to easy and in fact completely wrong. 答案“JavaScript不支持Unicode”有点容易,事实上完全错误。 JavaScript just doesn't use unicode for the character classes. JavaScript只是不为字符类使用unicode。 If JavaScript wouldn't support unicode you couldn't even use unicode Characters in String literals and of course this is possible in JavaScript. 如果JavaScript不支持unicode,你甚至不能在字符串文字中使用unicode字符,当然这在JavaScript中是可行的。

According to the ECMA 262 Standard (ECMAScript) (Section 15.10.2.6): 根据ECMA 262标准 (ECMAScript)(第15.10.2.6节):

[...] The production Assertion :: \\ b evaluates by returning an internal AssertionTester closure that takes a State argument x and performs the following: [...]生产Assertion :: \\ b通过返回一个内部AssertionTester闭包进行求值,该闭包采用State参数x并执行以下操作:

  1. Let e be x's endIndex. 设e是x的endIndex。
  2. Call IsWordChar(e–1) and let a be the Boolean result. 调用IsWordChar(e-1)并将a作为布尔结果。
  3. Call IsWordChar(e) and let b be the Boolean result. 调用IsWordChar(e)并将b作为布尔结果。
  4. If a is true and b is false, return true. 如果a为真且b为假,则返回true。
  5. If a is false and b is true, return true. 如果a为false且b为真,则返回true。
  6. Return false. 返回false。 [..] [..]

The abstract operation IsWordChar takes an integer parameter e and performs the following: 抽象操作IsWordChar采用整数参数e并执行以下操作:

  1. If e == –1 or e == InputLength, return false. 如果e == -1或e == InputLength,则返回false。
  2. Let c be the character Input[e]. 设c为字符Input [e]。
  3. If c is one of the sixty-three characters below, return true. 如果c是下面六十三个字符之一,则返回true。 abcdefghijklmnopqrstu vwxyz ABCDEFGHIJKLMNOPQRSTU VWXYZ 0 1 2 3 4 5 6 7 8 9 _ abcdefghijklmnopqrstu vwxyz ABCDEFGHIJKLMNOPQRSTU VWXYZ 0 1 2 3 4 5 6 7 8 9 _
  4. Return false 返回false

This just shows, that the \\b uses the Algorithm of "isWordChar" to check if what you try to match is actually a word. 这只是表明,\\ b使用“isWordChar”算法来检查你尝试匹配的是否实际上是一个单词。 Int he definition of "isWordChar" you can see the exact definition of which characters will return true for "isWordChar". 在“isWordChar”的定义中,您可以看到“isWordChar”中哪些字符将返回true的确切定义。

In my Opinion this has absolutely nothing to do with the character set being used. 在我的意见中,这与使用的字符集完全无关。 It's neither ASCII nor UNICODE compilant here. 这里既不是ASCII也不是UNICODE。 It's just these 63 characters. 这只是63个字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM