[英]How to filter out characters that aren't letters, numbers or punctuation
I have a string that will have a lot of formatting things like bullet points or arrows or whatever.我有一个字符串,它将包含很多格式内容,例如项目符号或箭头或其他任何内容。 I want to clean this string so that it only contains letters, numbers and punctuation.
我想清理这个字符串,使其只包含字母、数字和标点符号。 Multiple spaces should be replaced by a single space too.
多个空格也应替换为单个空格。
Allowed punctuation: , . : ; [ ] ( ) / \\ ! @ # $ % ^ & * + - _ { } < > = ? ~ | "
允许的标点符号:
, . : ; [ ] ( ) / \\ ! @ # $ % ^ & * + - _ { } < > = ? ~ | "
, . : ; [ ] ( ) / \\ ! @ # $ % ^ & * + - _ { } < > = ? ~ | "
Basically anything allowed in this ASCII table.基本上这个ASCII 表中允许的任何内容。
This is what I have so far:这是我到目前为止:
let asciiOnly = y.replace(/[^a-zA-Z0-9\s]+/gm, '')
let withoutSpacing = asciiOnly.replace(/\s{2,}/gm, ' ')
Regex101: https://regex101.com/r/0DC1tz/2 Regex101: https ://regex101.com/r/0DC1tz/2
I also tried the [:punct:]
tag but apparently it's not supported by javascript.我也试过
[:punct:]
标签,但显然它不受 javascript 支持。 Is there a better way I can clean this string other than regex?除了正则表达式,有没有更好的方法可以清理这个字符串? A library or something maybe (I didn't find any).
也许是图书馆或其他东西(我没有找到)。 If not, how would I do this with regex?
如果没有,我将如何用正则表达式做到这一点? Would I have to edit the first regex to add every single character of punctuation?
我是否必须编辑第一个正则表达式以添加标点符号的每个字符?
EDIT: I'm trying to paste an example string in the question but SO just removes characters it doesn't recognize so it looks like a normal string.编辑:我试图在问题中粘贴一个示例字符串,但 SO 只是删除了它无法识别的字符,因此它看起来像一个普通字符串。 Heres a paste .
这是一个粘贴。
EDIT2: I think this is what I needed: EDIT2:我认为这就是我需要的:
let asciiOnly = x.replace(/[^\x20-\x7E]+/gm, '')
let withoutSpacing = asciiOnly.replace(/\s{2,}/gm, ' ')
I'm testing it with different cases to make sure.我正在用不同的情况对其进行测试以确保。
You can achieve this using below regex, which finds any non-ascii characters (also excludes non-printable ascii characters and excluding extended ascii too) and removes it with empty string.您可以使用下面的正则表达式来实现这一点,它可以找到任何非 ascii 字符(也排除不可打印的 ascii 字符并排除扩展的 ascii)并用空字符串将其删除。
[^ -~]+
This is assuming you want to retain all printable ASCII characters only, which range from space (ascii value 32) to tilde ~
hence usage of this char set [^ !-~]
这是假设您只想保留所有可打印的 ASCII 字符,范围从空格(ascii 值 32)到波浪号
~
因此使用此字符集[^ !-~]
And then replaces all one or more white space with a single space然后用一个空格替换所有一个或多个空格
var str = `Determine the values of P∞ and E∞ for each of the following signals: bdf Periodic and aperiodic signals Determine whether or not each of the following signals is periodic: b. Determine whether or not each of the following signals is periodic. If a signal is periodic, specify its fundamental period. bd Transformation of Independent variables A continuous-time signal x(t) is shown in Figure 1. Sketch and label carefully each of the following signals: bcdef Figure 1: Problem Set 1.4 Even and Odd Signals For each signal given below, determine all the values of the independent variable at which the even part of the signal is guaranteed to be zero. bd -------------------------`; console.log(str.replace(/[^ -~]+/g,'').replace(/\\s+/g, ' ')); <!-- begin snippet: js hide: false console: true babel: false -->
console.log(str.replace(/[^ !-~]+/g,'').replace(/\s+/g, ' '));
Also, if you just want to allow all alphanumeric characters and mentioned special characters, then you can use this regex to first retain all needed characters using this regex ,此外,如果您只想允许所有字母数字字符和提到的特殊字符,那么您可以使用此正则表达式首先使用此正则表达式保留所有需要的字符,
[^ a-zA-Z0-9,.:;[\]()/\!@#$%^&*+_{}<>=?~|"-]+
Replace this with empty string and then replace one or more white spaces with just a single space.用空字符串替换它,然后用一个空格替换一个或多个空格。
var str = `Determine the values of P∞ and E∞ for each of the following signals: bdf Periodic and aperiodic signals Determine whether or not each of the following signals is periodic: b. Determine whether or not each of the following signals is periodic. If a signal is periodic, specify its fundamental period. bd Transformation of Independent variables A continuous-time signal x(t) is shown in Figure 1. Sketch and label carefully each of the following signals: bcdef Figure 1: Problem Set 1.4 Even and Odd Signals For each signal given below, determine all the values of the independent variable at which the even part of the signal is guaranteed to be zero. bd -------------------------`; console.log(str.replace(/[^ a-zA-Z0-9,.:;[\\]()/\\!@#$%^&*+_{}<>=?~|"-]+/g,'').replace(/\\s+/g, ' '));
This is how i will do.这就是我要做的。 I will remove the all the non allowed character first and than replace the multiple spaces with a single space.
我将首先删除所有不允许的字符,然后用一个空格替换多个空格。
let str = `Determine the values of P∞ and E∞ for each of the following signals: bdf Periodic and aperiodic signals Determine whether or not each of the following signals is periodic:!!!23 b. Determine whether or not each of the following signals is periodic. If a signal is periodic, specify its fundamental period. bd Transformation of Independent variables A continuous-time signal x(t) is shown in Figure 1. Sketch and label carefully each of the following signals: bcdef Figure 1: Problem Set 1.4 Even and Odd Signals For each signal given below, determine all the values of the independent variable at which the even part of the signal is guaranteed to be zero. bd ------------------------- ` const op = str.replace(/[^\\w,.:;\\[\\]()/\\!@#$%^&*+{}<>=?~|" -]/g, '').replace(/\\s+/g, " ") console.log(op)
EDIT : In case you want to keep \\n
or \\t
as it is use (\\s)\\1+, "$1"
in second regex.编辑:如果您想在第二个正则表达式中使用
(\\s)\\1+, "$1"
保留\\n
或\\t
。

) that you almost certainly don't want. 
) 你几乎肯定不想要。 I would expect you to find similar edge cases with any pre-made list." ° "
should be replaced with ""
, " "
, or " "
." ° "
是否应该替换为""
、 " "
或" "
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.