[英]Matching words with special characters at the beginning or end
It's a recurring topic, but I haven't been able to find a good solution. 这是一个反复出现的话题,但是我一直找不到一个好的解决方案。 I have words I need to match with the content of my page with regex in javascript, and those absolutely have to be words, not parts of words. 我有一些单词,我需要用javascript中的正则表达式来匹配我页面的内容,这些绝对必须是单词,而不是单词的一部分。 However some of them start or end with a letter from this set: [zżźćńółęąśŻŹĆĄŚĘŁÓŃA]. 但是,其中一些字母以以下字母开头或结尾:[zżźćńółęąśŻŹĆĄŚĘŁÓŃA]。
Word boundaries obviously do not work with these at the end or the beginning. 单词边界显然在结尾或开头都不适用于这些边界。 Replacing them with their unicode counterparts doesn't seem to work either. 用它们的unicode替换它们似乎也不起作用。
Right now I'm using a hack: I assigned numbers from 1 to 9 to lowercase letters from the list, and I'm checking if any letter in a word matches any key from the character dictionary. 现在,我正在使用一种技巧:我将1到9的数字分配给列表中的小写字母,并且我正在检查单词中的任何字母是否与字符词典中的任何键匹配。 If it does, it gets replaced with a number, then I replace it the same way in the content I need to match against. 如果是这样,它将被替换为数字,然后在需要匹配的内容中以相同的方式替换它。
It kinda works, but it's a half-measure, and it means the regex is no longer case-sensitive, which I would really like to have. 它有点用,但这只是一半,它意味着正则表达式不再区分大小写,这是我真正想要的。
Surely there has to be a clean solution? 当然必须有一个干净的解决方案?
EDIT as asked in a comment... 按照评论中的要求进行编辑...
/\bbudyń\b/g
budyń budyń asda budyńbudyń阿斯达
Matches bold, should match the first word and leave the other one intact. 匹配粗体,应匹配第一个单词,而另一个保持完整。
/\bósemka\b/g
ósemka asda ósemka ósemkaasdaósemka
Likewise. 同样。
Try this: 尝试这个:
/(?:\s|^)(ósemka)(?=\s|$)/g
The one above assumes that the word is followed by only a white space character (or end of string). 上面的一个假设单词仅后面有一个空格字符(或字符串的结尾)。 But if there are other characters that may follow the word such as period, question mark, etc, then this should work. 但是,如果在单词之后还有其他字符,例如句号,问号等,则应该可以。
/(?:\s|^)(ósemka)(?=[\s\.\?!;]|$)/g
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.