简体   繁体   English

从字符串中删除除智能引号之外的所有非 ASCII 字符

[英]Remove all non-ASCII characters from a string except smart quotes

I have this regex that removes all non-ascii characters from a string including all smart quotes:我有这个正则表达式,它从一个字符串中删除所有非 ascii 字符,包括所有智能引号:

str.replace(/[\u{0080}-\u{FFFF}]/gu,"");

But I need to keep the Smart quotes但我需要保留智能引号

The regex for removing Smart single quotes is: [\‘\’\‚\‛\′\‵] and for Smart double quotes is: [\“\”\„\‟\″\‶] .删除智能单引号的正则表达式是: [\‘\’\‚\‛\′\‵]和智能双引号是: [\“\”\„\‟\″\‶]

I need a combined regex that that removes all non-ASCII ( [\\u{0080}-\\u{FFFF}] ) except smart quotes ( [\‘\’\‚\‛\′\‵] or [\“\”\„\‟\″\‶] ).我需要一个综合的正则表达式是去除所有非ASCII( [\\u{0080}-\\u{FFFF}]除了智能引号( [\‘\’\‚\‛\′\‵][\“\”\„\‟\″\‶] )。

Note that you need to use the \\u{XXXX} notation in the regex with u modifier, and to build the regex you need you need to put the character class with exceptions into a negative lookahead placed right before your more generic pattern:请注意,您需要在带有u修饰符的正则表达式中使用\\u{XXXX}表示法,并且要构建正则表达式,您需要将具有异常的字符类放入位于更通用模式之前的负前瞻中:

/(?![\u{2018}\u{2019}\u{201A}\u{201B}\u{2032}\u{2035}\u{201C}\u{201D}\u{201E}\u{201F}\u{2033}\u{2036}])[\u{0080}-\u{FFFF}]/gu

See the regex demo查看正则表达式演示

Note that some chars in the Unicode table go one after another, so we may shorten the pattern using ranges:请注意,Unicode 表中的某些字符一个接一个,因此我们可以使用范围来缩短模式:

/(?![\u{2018}-\u{201F}\u{2032}\u{2033}\u{2035}\u{2036}])[\u{0080}-\u{FFFF}]/gu

See this demo .请参阅此演示

Instead of matching the non-ascii, match the ascii + the characters you need, and negate the expression.不是匹配非ascii,而是匹配ascii+你需要的字符,并否定表达式。 Example:例子:

str.replace(/[^\x00-\x7F\u2018\u2019\u201A\u201B\u2032\u2035\u201C\u201D\u201E\u201F\u2033\u2036]/gu,"");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM