简体   繁体   English

正则表达式可删除除字母和 '[单引号] 之外的任何内容

[英]Regular expression to remove anything but alphabets and '[single quote]

How can I change this regular expression to remove everything from a string except alphabets and a '(single quote)?如何更改此正则表达式以从字符串中删除除字母和 '(单引号)之外的所有内容?

pattern = /\b(ma?c)?([a-z]+)/ig;
  1. this pattern removes unwanted spaces and capitalizes the first letter and turns the rest into lower case此模式删除不需要的空格并将第一个字母大写,并将其余字母变为小写
  2. By alphabets I mean English letters az.我所说的字母是指英文字母 az。

To remove characters, you'd need to use something that actually does that, like the string replace function (which can accept a regular expression as the "from" parameter).删除字符,您需要使用实际执行此操作的功能,例如字符串replace函数(它可以接受正则表达式作为“from”参数)。

Then you're just dealing with a normal application of a character class , which in JavaScript (and most other regular expression variants) is described using [...] , where ... is what should be in the class.然后,您只是在处理字符类的正常应用程序,它在 JavaScript(以及大多数其他正则表达式变体)中使用[...]描述,其中...是类中应该包含的内容。 You'd use the ^ at the beginning to invert the meaning of the class:您可以在开头使用^来反转类的含义:

In your case, it might be:在你的情况下,它可能是:

str = str.replace(/[^A-Za-z']/g, "");

...which will replace except the English characters AZ (ABCDEFGHIJKLMNOPQRSTUVWXYZ), az (abcdefghijklmnopqrstuvwxyz), and the single quote with nothing (eg, remove it). ...它将替换英文字符 AZ (ABCDEFGHIJKLMNOPQRSTUVWXYZ)、az (abcdefghijklmnopqrstuvwxyz) 和单引号之外的任何内容(例如,将其删除)。

 let str = "This is a test with the numbers 123 and a '."; console.log("before:", str); str = str.replace(/[^A-Za-z']/g, ""); console.log("after: ", str);

However , note that alphabetic characters not used in English will not be excepted, and there are a lot of those in the various languages used on the web (and even, perversely, in English, in "borrowed" words like "voilà" and "naïve").但是,请注意,英语中不使用的字母字符也不例外,网络上使用的各种语言中都有很多字母字符(甚至,在英语中,在“借用”词中,例如“voilà”和“幼稚的”)。

You've said you're okay with just English AZ, but for others coming to this: In environemnts supporting ES2018 and above's Unicode property matching , you could handle anything considered "alphabetic" by Unicode instead of just AZ by using the \\p{Alpha} property.你已经说过你只用英语 AZ 就可以了,但对于其他人来说:在支持 ES2018 及更高版本的Unicode 属性匹配的环境中,你可以处理任何被 Unicode 视为“字母”的东西,而不仅仅是 AZ 使用\\p{Alpha}属性。 The \\p means "matching this Unicode property" (as usual, the lowercase version \\p means "matching" and the uppercase version \\P means "not matching") and the {Alpha} means "alphabetic": \\p表示“匹配这个 Unicode 属性”(像往常一样,小写版本\\p表示“匹配”,大写版本\\P表示“不匹配”), {Alpha}表示“字母”:

str = str.replace(/[^\p{Alpha}']/gu, "");

(Note that, again, \\p{Alpha} means "alphabetic" but because it's in a negated character class, we're excluding alphabetic characters.) (请注意, \\p{Alpha}再次表示“字母”,但由于它属于否定字符类,因此我们排除了字母字符。)

Note the u flag on that, to enable newer Unicode features.请注意上面的u标志,以启用更新的 Unicode 功能。 That handles the "voilà" and "naïve" examples too:这也处理了“voilà”和“naïve”的例子:

 let str = "This is a test with the numbers 123 and a ' and voilà and naïve."; console.log("before:", str); str = str.replace(/[^\\p{Alpha}']/gu, ""); console.log("after: ", str);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM