[英]How to filter out characters that aren't letters, numbers or punctuation
我有一个字符串,它将包含很多格式内容,例如项目符号或箭头或其他任何内容。 我想清理这个字符串,使其只包含字母、数字和标点符号。 多个空格也应替换为单个空格。
允许的标点符号: , . : ; [ ] ( ) / \\ ! @ # $ % ^ & * + - _ { } < > = ? ~ | "
, . : ; [ ] ( ) / \\ ! @ # $ % ^ & * + - _ { } < > = ? ~ | "
基本上这个ASCII 表中允许的任何内容。
这是我到目前为止:
let asciiOnly = y.replace(/[^a-zA-Z0-9\s]+/gm, '')
let withoutSpacing = asciiOnly.replace(/\s{2,}/gm, ' ')
Regex101: https ://regex101.com/r/0DC1tz/2
我也试过[:punct:]
标签,但显然它不受 javascript 支持。 除了正则表达式,有没有更好的方法可以清理这个字符串? 也许是图书馆或其他东西(我没有找到)。 如果没有,我将如何用正则表达式做到这一点? 我是否必须编辑第一个正则表达式以添加标点符号的每个字符?
编辑:我试图在问题中粘贴一个示例字符串,但 SO 只是删除了它无法识别的字符,因此它看起来像一个普通字符串。 这是一个粘贴。
EDIT2:我认为这就是我需要的:
let asciiOnly = x.replace(/[^\x20-\x7E]+/gm, '')
let withoutSpacing = asciiOnly.replace(/\s{2,}/gm, ' ')
我正在用不同的情况对其进行测试以确保。
您可以使用下面的正则表达式来实现这一点,它可以找到任何非 ascii 字符(也排除不可打印的 ascii 字符并排除扩展的 ascii)并用空字符串将其删除。
[^ -~]+
这是假设您只想保留所有可打印的 ASCII 字符,范围从空格(ascii 值 32)到波浪号~
因此使用此字符集[^ !-~]
然后用一个空格替换所有一个或多个空格
var str = `Determine the values of P∞ and E∞ for each of the following signals: bdf Periodic and aperiodic signals Determine whether or not each of the following signals is periodic: b. Determine whether or not each of the following signals is periodic. If a signal is periodic, specify its fundamental period. bd Transformation of Independent variables A continuous-time signal x(t) is shown in Figure 1. Sketch and label carefully each of the following signals: bcdef Figure 1: Problem Set 1.4 Even and Odd Signals For each signal given below, determine all the values of the independent variable at which the even part of the signal is guaranteed to be zero. bd -------------------------`; console.log(str.replace(/[^ -~]+/g,'').replace(/\\s+/g, ' ')); <!-- begin snippet: js hide: false console: true babel: false -->
console.log(str.replace(/[^ !-~]+/g,'').replace(/\s+/g, ' '));
此外,如果您只想允许所有字母数字字符和提到的特殊字符,那么您可以使用此正则表达式首先使用此正则表达式保留所有需要的字符,
[^ a-zA-Z0-9,.:;[\]()/\!@#$%^&*+_{}<>=?~|"-]+
用空字符串替换它,然后用一个空格替换一个或多个空格。
var str = `Determine the values of P∞ and E∞ for each of the following signals: bdf Periodic and aperiodic signals Determine whether or not each of the following signals is periodic: b. Determine whether or not each of the following signals is periodic. If a signal is periodic, specify its fundamental period. bd Transformation of Independent variables A continuous-time signal x(t) is shown in Figure 1. Sketch and label carefully each of the following signals: bcdef Figure 1: Problem Set 1.4 Even and Odd Signals For each signal given below, determine all the values of the independent variable at which the even part of the signal is guaranteed to be zero. bd -------------------------`; console.log(str.replace(/[^ a-zA-Z0-9,.:;[\\]()/\\!@#$%^&*+_{}<>=?~|"-]+/g,'').replace(/\\s+/g, ' '));
这就是我要做的。 我将首先删除所有不允许的字符,然后用一个空格替换多个空格。
let str = `Determine the values of P∞ and E∞ for each of the following signals: bdf Periodic and aperiodic signals Determine whether or not each of the following signals is periodic:!!!23 b. Determine whether or not each of the following signals is periodic. If a signal is periodic, specify its fundamental period. bd Transformation of Independent variables A continuous-time signal x(t) is shown in Figure 1. Sketch and label carefully each of the following signals: bcdef Figure 1: Problem Set 1.4 Even and Odd Signals For each signal given below, determine all the values of the independent variable at which the even part of the signal is guaranteed to be zero. bd ------------------------- ` const op = str.replace(/[^\\w,.:;\\[\\]()/\\!@#$%^&*+{}<>=?~|" -]/g, '').replace(/\\s+/g, " ") console.log(op)
编辑:如果您想在第二个正则表达式中使用(\\s)\\1+, "$1"
保留\\n
或\\t
。

) 你几乎肯定不想要。 我希望您能在任何预制列表中找到类似的边缘情况。 在任何情况下,通常只明确列出您想要允许的所有标点符号会更好,因为这样就没有人需要查找您选择的列表中的内容。" ° "
是否应该替换为""
、 " "
或" "
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.