简体   繁体   English

JavaScript和PHP的正则表达式的含义

[英]Meaning of Regular Expression with JavaScript and PHP

Can anybody explain me the use of this Regular Expression? 谁能解释这个正则表达式的用法?

I want to truncate characters which has Ascii code less than 32 except 我想截断ASCII码少于32的字符,除了

Horizontal Tab, Line Feed and Carriage Return. 水平制表符,换行和回车。

Does below code will work accordingly? 下面的代码会相应地工作吗? or Do I need to change it? 还是我需要更改?

JavaScript Code: JavaScript代码:

var text = text.replace(/[\x00-\x09\x0A\x0D-\x2F]+/, "");

PHP Code PHP代码

$val = preg_replace('/[\x00-\x09\x0A\x0D-\x2F]/', '',$val);

Edit 编辑

I want to preserve LF, HT and CR and not want to truncate them from String if any. 我想保留LF,HT和CR,并且不想从String截断它们(如果有)。 Other characters below Ascii 32 should be Truncated. Ascii 32以下的其他字符应被截断。

Well, given: 好吧,鉴于:

  • 0x09 = tab 0x09 =标签
  • 0x12 = line feed 0x12 =换行
  • 0x15 = carriage return 0x15 =回车

Then anything but the above (and still <32 ) would look something like: 那么什么,但上述(仍然<32 )看起来是这样的:

/[\x00-\x08\x10\x11\x13\x14\x16-\x1F]/

And I assume you meant an exclusive match (up to but not including 32) otherwise the last hex code should be \\x20 . 而且我认为您的意思是排他匹配(最多但不包括32个),否则最后一个十六进制代码应为\\x20


$orig   = "This is a sample document. It contains:\r\n"
        . "\t* horizontal tabs,\r\n"
        . "\t* line feeds, and\r\n"
        . "\t* carriage returns\r\n"
        . "\r\n"
        . "These characters are not to be removed. However, other characters, such as:\r\n"
        . "\r\n"
        . "\t'\x06' (ACK),\r\n"
        . "\t'\x07' (BEL),\r\n"
        . "\t'\x1B' (ESC)\r\n"
        . "\t(others)\r\n"
        . "\r\n"
        . "And other characters < ordinal 32 should be removed.";

$modif  = preg_replace('/[\x00-\x08\x10\x11\x13\x14\x16-\x1F]/', '', $orig);

echo str_repeat('=', 50) . PHP_EOL;
echo (strlen($orig) == strlen($modif) ? "Failed" : "Success") . PHP_EOL;
echo str_repeat('=', 50) . PHP_EOL;
echo PHP_EOL;
echo $modif;

Based on the $modif is shorter than $orig (by 3 characters [ \\x06 , \\x07 , \\x1B ]) but the white space characters ([ \\x09 , \\x12 , \\x15 ]) were preserved, I would say this is what you're after. 基于$modif$orig短(减少了3个字符[ \\x06\\x07\\x1B ]),但是\\x1B了空白字符([ \\x09\\x12\\x15 ]),这是你在追求什么。

Your first question (explanation of regex)

Since your hex codes correspond to symbols (decimal less that 128) - you can use ASCII for checking what will be passed. Your regex is replacing these symbols:

0   000 00  00000000    NUL       Null char
1   001 01  00000001    SOH       Start of Heading
2   002 02  00000010    STX       Start of Text
3   003 03  00000011    ETX       End of Text
4   004 04  00000100    EOT       End of Transmission
5   005 05  00000101    ENQ       Enquiry
6   006 06  00000110    ACK       Acknowledgment
7   007 07  00000111    BEL       Bell
8   010 08  00001000    BS        Back Space
9   011 09  00001001    HT  	  Horizontal Tab
10  012 0A  00001010    LF        Line Feed

and these:

13  015 0D  00001101    CR  
      Carriage Return
14  016 0E  00001110    SO        Shift Out / X-On
15  017 0F  00001111    SI        Shift In / X-Off
16  020 10  00010000    DLE       Data Line Escape
17  021 11  00010001    DC1       Device Control 1 (oft. XON)
18  022 12  00010010    DC2       Device Control 2
19  023 13  00010011    DC3       Device Control 3 (oft. XOFF)
20  024 14  00010100    DC4       Device Control 4
21  025 15  00010101    NAK       Negative Acknowledgement
22  026 16  00010110    SYN       Synchronous Idle
23  027 17  00010111    ETB       End of Transmit Block
24  030 18  00011000    CAN       Cancel
25  031 19  00011001    EM        End of Medium
26  032 1A  00011010    SUB       Substitute
27  033 1B  00011011    ESC       Escape
28  034 1C  00011100    FS        File Separator
29  035 1D  00011101    GS        Group Separator
30  036 1E  00011110    RS        Record Separator
31  037 1F  00011111    US        Unit Separator
32  040 20  00100000                Space
33  041 21  00100001    !   !       Exclamation mark
34  042 22  00100010    "   "   "  Double quotes (or speech marks)
35  043 23  00100011    #   #       Number
36  044 24  00100100    $   $       Dollar
37  045 25  00100101    %   %       Procenttecken
38  046 26  00100110    &   &   &   Ampersand
39  047 27  00100111    '   '       Single quote
40  050 28  00101000    (   (       Open parenthesis (or open bracket)
41  051 29  00101001    )   )       Close parenthesis (or close bracket)
42  052 2A  00101010    *   *       Asterisk
43  053 2B  00101011    +   +       Plus
44  054 2C  00101100    ,   ,       Comma
45  055 2D  00101101    -   -       Hyphen
46  056 2E  00101110    .   .       Period, dot or full stop
47  057 2F  00101111    /   /       Slash or divide

to empty string.

Your second question (replace non-printables, i.e. 0-31, i.e. 0x00-0x19)

If you want to truncate all symbols (non-printable, it seems) below 32 decimal, then:

$val = preg_replace('/[\x00-\x09\x12\x14-\x19]/', '',$val); //x12 also should be restricted

(updated, preserving HT, LF, CR)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM