[英]Remove all non-ASCII characters from a string except smart quotes
I have this regex that removes all non-ascii characters from a string including all smart quotes:我有这个正则表达式,它从一个字符串中删除所有非 ascii 字符,包括所有智能引号:
str.replace(/[\u{0080}-\u{FFFF}]/gu,"");
But I need to keep the Smart quotes但我需要保留智能引号
The regex for removing Smart single quotes is: [\‘\’\‚\‛\′\‵]
and for Smart double quotes is: [\“\”\„\‟\″\‶]
.删除智能单引号的正则表达式是:
[\‘\’\‚\‛\′\‵]
和智能双引号是: [\“\”\„\‟\″\‶]
。
I need a combined regex that that removes all non-ASCII ( [\\u{0080}-\\u{FFFF}]
) except smart quotes ( [\‘\’\‚\‛\′\‵]
or [\“\”\„\‟\″\‶]
).我需要一个综合的正则表达式是去除所有非ASCII(
[\\u{0080}-\\u{FFFF}]
除了智能引号( [\‘\’\‚\‛\′\‵]
或[\“\”\„\‟\″\‶]
)。
Note that you need to use the \\u{XXXX}
notation in the regex with u
modifier, and to build the regex you need you need to put the character class with exceptions into a negative lookahead placed right before your more generic pattern:请注意,您需要在带有
u
修饰符的正则表达式中使用\\u{XXXX}
表示法,并且要构建正则表达式,您需要将具有异常的字符类放入位于更通用模式之前的负前瞻中:
/(?![\u{2018}\u{2019}\u{201A}\u{201B}\u{2032}\u{2035}\u{201C}\u{201D}\u{201E}\u{201F}\u{2033}\u{2036}])[\u{0080}-\u{FFFF}]/gu
See the regex demo查看正则表达式演示
Note that some chars in the Unicode table go one after another, so we may shorten the pattern using ranges:请注意,Unicode 表中的某些字符一个接一个,因此我们可以使用范围来缩短模式:
/(?![\u{2018}-\u{201F}\u{2032}\u{2033}\u{2035}\u{2036}])[\u{0080}-\u{FFFF}]/gu
Instead of matching the non-ascii, match the ascii + the characters you need, and negate the expression.不是匹配非ascii,而是匹配ascii+你需要的字符,并否定表达式。 Example:
例子:
str.replace(/[^\x00-\x7F\u2018\u2019\u201A\u201B\u2032\u2035\u201C\u201D\u201E\u201F\u2033\u2036]/gu,"");
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.