[英]How to filter out characters that aren't letters, numbers or punctuation
我有一個字符串,它將包含很多格式內容,例如項目符號或箭頭或其他任何內容。 我想清理這個字符串,使其只包含字母、數字和標點符號。 多個空格也應替換為單個空格。
允許的標點符號: , . : ; [ ] ( ) / \\ ! @ # $ % ^ & * + - _ { } < > = ? ~ | "
, . : ; [ ] ( ) / \\ ! @ # $ % ^ & * + - _ { } < > = ? ~ | "
基本上這個ASCII 表中允許的任何內容。
這是我到目前為止:
let asciiOnly = y.replace(/[^a-zA-Z0-9\s]+/gm, '')
let withoutSpacing = asciiOnly.replace(/\s{2,}/gm, ' ')
Regex101: https ://regex101.com/r/0DC1tz/2
我也試過[:punct:]
標簽,但顯然它不受 javascript 支持。 除了正則表達式,有沒有更好的方法可以清理這個字符串? 也許是圖書館或其他東西(我沒有找到)。 如果沒有,我將如何用正則表達式做到這一點? 我是否必須編輯第一個正則表達式以添加標點符號的每個字符?
編輯:我試圖在問題中粘貼一個示例字符串,但 SO 只是刪除了它無法識別的字符,因此它看起來像一個普通字符串。 這是一個粘貼。
EDIT2:我認為這就是我需要的:
let asciiOnly = x.replace(/[^\x20-\x7E]+/gm, '')
let withoutSpacing = asciiOnly.replace(/\s{2,}/gm, ' ')
我正在用不同的情況對其進行測試以確保。
您可以使用下面的正則表達式來實現這一點,它可以找到任何非 ascii 字符(也排除不可打印的 ascii 字符並排除擴展的 ascii)並用空字符串將其刪除。
[^ -~]+
這是假設您只想保留所有可打印的 ASCII 字符,范圍從空格(ascii 值 32)到波浪號~
因此使用此字符集[^ !-~]
然后用一個空格替換所有一個或多個空格
var str = `Determine the values of P∞ and E∞ for each of the following signals: bdf Periodic and aperiodic signals Determine whether or not each of the following signals is periodic: b. Determine whether or not each of the following signals is periodic. If a signal is periodic, specify its fundamental period. bd Transformation of Independent variables A continuous-time signal x(t) is shown in Figure 1. Sketch and label carefully each of the following signals: bcdef Figure 1: Problem Set 1.4 Even and Odd Signals For each signal given below, determine all the values of the independent variable at which the even part of the signal is guaranteed to be zero. bd -------------------------`; console.log(str.replace(/[^ -~]+/g,'').replace(/\\s+/g, ' ')); <!-- begin snippet: js hide: false console: true babel: false -->
console.log(str.replace(/[^ !-~]+/g,'').replace(/\s+/g, ' '));
此外,如果您只想允許所有字母數字字符和提到的特殊字符,那么您可以使用此正則表達式首先使用此正則表達式保留所有需要的字符,
[^ a-zA-Z0-9,.:;[\]()/\!@#$%^&*+_{}<>=?~|"-]+
用空字符串替換它,然后用一個空格替換一個或多個空格。
var str = `Determine the values of P∞ and E∞ for each of the following signals: bdf Periodic and aperiodic signals Determine whether or not each of the following signals is periodic: b. Determine whether or not each of the following signals is periodic. If a signal is periodic, specify its fundamental period. bd Transformation of Independent variables A continuous-time signal x(t) is shown in Figure 1. Sketch and label carefully each of the following signals: bcdef Figure 1: Problem Set 1.4 Even and Odd Signals For each signal given below, determine all the values of the independent variable at which the even part of the signal is guaranteed to be zero. bd -------------------------`; console.log(str.replace(/[^ a-zA-Z0-9,.:;[\\]()/\\!@#$%^&*+_{}<>=?~|"-]+/g,'').replace(/\\s+/g, ' '));
這就是我要做的。 我將首先刪除所有不允許的字符,然后用一個空格替換多個空格。
let str = `Determine the values of P∞ and E∞ for each of the following signals: bdf Periodic and aperiodic signals Determine whether or not each of the following signals is periodic:!!!23 b. Determine whether or not each of the following signals is periodic. If a signal is periodic, specify its fundamental period. bd Transformation of Independent variables A continuous-time signal x(t) is shown in Figure 1. Sketch and label carefully each of the following signals: bcdef Figure 1: Problem Set 1.4 Even and Odd Signals For each signal given below, determine all the values of the independent variable at which the even part of the signal is guaranteed to be zero. bd ------------------------- ` const op = str.replace(/[^\\w,.:;\\[\\]()/\\!@#$%^&*+{}<>=?~|" -]/g, '').replace(/\\s+/g, " ") console.log(op)
編輯:如果您想在第二個正則表達式中使用(\\s)\\1+, "$1"
保留\\n
或\\t
。

) 你幾乎肯定不想要。 我希望您能在任何預制列表中找到類似的邊緣情況。 在任何情況下,通常只明確列出您想要允許的所有標點符號會更好,因為這樣就沒有人需要查找您選擇的列表中的內容。" ° "
是否應該替換為""
、 " "
或" "
。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.