[英]Meaning of Regular Expression with JavaScript and PHP
Can anybody explain me the use of this Regular Expression? 谁能解释这个正则表达式的用法?
I want to truncate characters which has Ascii code less than 32 except 我想截断ASCII码少于32的字符,除了
Horizontal Tab, Line Feed and Carriage Return. 水平制表符,换行和回车。
Does below code will work accordingly? 下面的代码会相应地工作吗? or Do I need to change it?
还是我需要更改?
JavaScript Code: JavaScript代码:
var text = text.replace(/[\x00-\x09\x0A\x0D-\x2F]+/, "");
PHP Code PHP代码
$val = preg_replace('/[\x00-\x09\x0A\x0D-\x2F]/', '',$val);
Edit 编辑
I want to preserve LF, HT and CR and not want to truncate them from String if any. 我想保留LF,HT和CR,并且不想从String截断它们(如果有)。 Other characters below Ascii 32 should be Truncated.
Ascii 32以下的其他字符应被截断。
Well, given: 好吧,鉴于:
Then anything but the above (and still <32
) would look something like: 那么什么,但上述(仍然
<32
)看起来是这样的:
/[\x00-\x08\x10\x11\x13\x14\x16-\x1F]/
And I assume you meant an exclusive match (up to but not including 32) otherwise the last hex code should be \\x20
. 而且我认为您的意思是排他匹配(最多但不包括32个),否则最后一个十六进制代码应为
\\x20
。
$orig = "This is a sample document. It contains:\r\n"
. "\t* horizontal tabs,\r\n"
. "\t* line feeds, and\r\n"
. "\t* carriage returns\r\n"
. "\r\n"
. "These characters are not to be removed. However, other characters, such as:\r\n"
. "\r\n"
. "\t'\x06' (ACK),\r\n"
. "\t'\x07' (BEL),\r\n"
. "\t'\x1B' (ESC)\r\n"
. "\t(others)\r\n"
. "\r\n"
. "And other characters < ordinal 32 should be removed.";
$modif = preg_replace('/[\x00-\x08\x10\x11\x13\x14\x16-\x1F]/', '', $orig);
echo str_repeat('=', 50) . PHP_EOL;
echo (strlen($orig) == strlen($modif) ? "Failed" : "Success") . PHP_EOL;
echo str_repeat('=', 50) . PHP_EOL;
echo PHP_EOL;
echo $modif;
Based on the $modif
is shorter than $orig
(by 3 characters [ \\x06
, \\x07
, \\x1B
]) but the white space characters ([ \\x09
, \\x12
, \\x15
]) were preserved, I would say this is what you're after. 基于
$modif
比$orig
短(减少了3个字符[ \\x06
, \\x07
, \\x1B
]),但是\\x1B
了空白字符([ \\x09
, \\x12
, \\x15
]),这是你在追求什么。
Your first question (explanation of regex)
Since your hex codes correspond to symbols (decimal less that 128) - you can use ASCII for checking what will be passed. Your regex is replacing these symbols:
0 000 00 00000000 NUL Null char 1 001 01 00000001 SOH Start of Heading 2 002 02 00000010 STX Start of Text 3 003 03 00000011 ETX End of Text 4 004 04 00000100 EOT End of Transmission 5 005 05 00000101 ENQ Enquiry 6 006 06 00000110 ACK Acknowledgment 7 007 07 00000111 BEL Bell 8 010 08 00001000 BS Back Space 9 011 09 00001001 HT Horizontal Tab 10 012 0A 00001010 LF Line Feed
and these:
13 015 0D 00001101 CR Carriage Return 14 016 0E 00001110 SO Shift Out / X-On 15 017 0F 00001111 SI Shift In / X-Off 16 020 10 00010000 DLE Data Line Escape 17 021 11 00010001 DC1 Device Control 1 (oft. XON) 18 022 12 00010010 DC2 Device Control 2 19 023 13 00010011 DC3 Device Control 3 (oft. XOFF) 20 024 14 00010100 DC4 Device Control 4 21 025 15 00010101 NAK Negative Acknowledgement 22 026 16 00010110 SYN Synchronous Idle 23 027 17 00010111 ETB End of Transmit Block 24 030 18 00011000 CAN Cancel 25 031 19 00011001 EM End of Medium 26 032 1A 00011010 SUB Substitute 27 033 1B 00011011 ESC Escape 28 034 1C 00011100 FS File Separator 29 035 1D 00011101 GS Group Separator 30 036 1E 00011110 RS Record Separator 31 037 1F 00011111 US Unit Separator 32 040 20 00100000 Space 33 041 21 00100001 ! ! Exclamation mark 34 042 22 00100010 " " " Double quotes (or speech marks) 35 043 23 00100011 # # Number 36 044 24 00100100 $ $ Dollar 37 045 25 00100101 % % Procenttecken 38 046 26 00100110 & & & Ampersand 39 047 27 00100111 ' ' Single quote 40 050 28 00101000 ( ( Open parenthesis (or open bracket) 41 051 29 00101001 ) ) Close parenthesis (or close bracket) 42 052 2A 00101010 * * Asterisk 43 053 2B 00101011 + + Plus 44 054 2C 00101100 , , Comma 45 055 2D 00101101 - - Hyphen 46 056 2E 00101110 . . Period, dot or full stop 47 057 2F 00101111 / / Slash or divide
to empty string.
Your second question (replace non-printables, i.e. 0-31
, i.e. 0x00-0x19
)
If you want to truncate all symbols (non-printable, it seems) below 32 decimal, then:
$val = preg_replace('/[\x00-\x09\x12\x14-\x19]/', '',$val); //x12 also should be restricted
(updated, preserving HT
, LF
, CR
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.