简体   繁体   中英

How to filter out characters that aren't letters, numbers or punctuation

I have a string that will have a lot of formatting things like bullet points or arrows or whatever. I want to clean this string so that it only contains letters, numbers and punctuation. Multiple spaces should be replaced by a single space too.

Allowed punctuation: , . : ; [ ] ( ) / \\ ! @ # $ % ^ & * + - _ { } < > = ? ~ | " , . : ; [ ] ( ) / \\ ! @ # $ % ^ & * + - _ { } < > = ? ~ | "

Basically anything allowed in this ASCII table.

This is what I have so far:

let asciiOnly = y.replace(/[^a-zA-Z0-9\s]+/gm, '')
let withoutSpacing = asciiOnly.replace(/\s{2,}/gm, ' ')

Regex101: https://regex101.com/r/0DC1tz/2

I also tried the [:punct:] tag but apparently it's not supported by javascript. Is there a better way I can clean this string other than regex? A library or something maybe (I didn't find any). If not, how would I do this with regex? Would I have to edit the first regex to add every single character of punctuation?

EDIT: I'm trying to paste an example string in the question but SO just removes characters it doesn't recognize so it looks like a normal string. Heres a paste .

EDIT2: I think this is what I needed:

let asciiOnly = x.replace(/[^\x20-\x7E]+/gm, '')
let withoutSpacing = asciiOnly.replace(/\s{2,}/gm, ' ')

I'm testing it with different cases to make sure.

You can achieve this using below regex, which finds any non-ascii characters (also excludes non-printable ascii characters and excluding extended ascii too) and removes it with empty string.

[^ -~]+

This is assuming you want to retain all printable ASCII characters only, which range from space (ascii value 32) to tilde ~ hence usage of this char set [^ !-~]

And then replaces all one or more white space with a single space

 var str = `Determine the values of P∞ and E∞ for each of the following signals: bdf Periodic and aperiodic signals Determine whether or not each of the following signals is periodic: b. Determine whether or not each of the following signals is periodic. If a signal is periodic, specify its fundamental period. bd Transformation of Independent variables A continuous-time signal x(t) is shown in Figure 1. Sketch and label carefully each of the following signals: bcdef Figure 1: Problem Set 1.4 Even and Odd Signals For each signal given below, determine all the values of the independent variable at which the even part of the signal is guaranteed to be zero. bd -------------------------`; console.log(str.replace(/[^ -~]+/g,'').replace(/\\s+/g, ' ')); <!-- begin snippet: js hide: false console: true babel: false -->

console.log(str.replace(/[^ !-~]+/g,'').replace(/\s+/g, ' '));

Also, if you just want to allow all alphanumeric characters and mentioned special characters, then you can use this regex to first retain all needed characters using this regex ,

[^ a-zA-Z0-9,.:;[\]()/\!@#$%^&*+_{}<>=?~|"-]+

Replace this with empty string and then replace one or more white spaces with just a single space.

 var str = `Determine the values of P∞ and E∞ for each of the following signals: bdf Periodic and aperiodic signals Determine whether or not each of the following signals is periodic: b. Determine whether or not each of the following signals is periodic. If a signal is periodic, specify its fundamental period. bd Transformation of Independent variables A continuous-time signal x(t) is shown in Figure 1. Sketch and label carefully each of the following signals: bcdef Figure 1: Problem Set 1.4 Even and Odd Signals For each signal given below, determine all the values of the independent variable at which the even part of the signal is guaranteed to be zero. bd -------------------------`; console.log(str.replace(/[^ a-zA-Z0-9,.:;[\\]()/\\!@#$%^&*+_{}<>=?~|"-]+/g,'').replace(/\\s+/g, ' '));

This is how i will do. I will remove the all the non allowed character first and than replace the multiple spaces with a single space.

 let str = `Determine the values of P∞ and E∞ for each of the following signals: bdf Periodic and aperiodic signals Determine whether or not each of the following signals is periodic:!!!23 b. Determine whether or not each of the following signals is periodic. If a signal is periodic, specify its fundamental period. bd Transformation of Independent variables A continuous-time signal x(t) is shown in Figure 1. Sketch and label carefully each of the following signals: bcdef Figure 1: Problem Set 1.4 Even and Odd Signals For each signal given below, determine all the values of the independent variable at which the even part of the signal is guaranteed to be zero. bd ------------------------- ` const op = str.replace(/[^\\w,.:;\\[\\]()/\\!@#$%^&*+{}<>=?~|" -]/g, '').replace(/\\s+/g, " ") console.log(op)

EDIT : In case you want to keep \\n or \\t as it is use (\\s)\\1+, "$1" in second regex.

  • There probably isn't a better solution than a regex. The under-the-hood implementation of regex actions is usually well optimized by virtue of age and ubiquity.
  • You may be able to explicitly tell the regex handler to "compile" the regex. This is usually a good idea if you know the regex is going to be used a lot within a program, and may help with performance here. But I don't know if javascript exposes such an option.
  • The idea of "normal punctuation" doesn't have an excellent foundation. There are some common marks like "90°" that aren't ASCII, and some ASCII marks like "?" ( &#127; ) that you almost certainly don't want. I would expect you to find similar edge cases with any pre-made list. In any case, just explicitly listing all the punctuation you want to allow is better in general , because then no one will ever have to look up what's in the list you chose.
  • You may be able to perform both substitutions in a single pass, but it's unclear if that will perform better and it almost certainly won't be clearer to any co-workers (including yourself-from-the-future). There will be a lot of finicky details to work out such as whether " ° " should be replaced with "" , " " , or " " .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM