简体   繁体   English

Ruby gsub / regex修饰符?

[英]Ruby gsub / regex modifiers?

Where can I find the documentation on the modifiers for gsub ? 我在哪里可以找到关于gsub修饰符的文档? \\a \\b \\c \\1 \\2 \\3 %a %b %c $1 $2 %3 etc.? \\ a \\ b \\ c \\ 1 \\ 2 \\ 3%a%b%c $ 1 $ 2%3等?

Specifically, I'm looking at this code... something.gsub(/%u/, unit) what's the %u ? 具体来说,我正在看这个代码...... something.gsub(/%u/, unit)你的%u是什么?

First off, %u is nothing special in ruby regex: 首先, %u在ruby正则表达式中没有什么特别之处:

mixonic@pandora ~ $ irb
irb(main):001:0> '%u'.gsub(/%u/,'heyhey')
=> "heyhey"

The definitive documentation for Ruby 1.8 regex is in the Ruby Doc Bundle: Ruby 1.8 regex的权威文档在Ruby Doc Bundle中:

Strings delimited by slashes are regular expressions. 由斜杠分隔的字符串是正则表达式。 The characters right after latter slash denotes the option to the regular expression. 紧接在后一个斜杠之后的字符表示正则表达式的选项。 Option i means that regular expression is case insensitive. 选项i表示正则表达式不区分大小写。 Option i means that regular expression does expression substitution only once at the first time it evaluated. 选项i表示正则表达式在第一次计算时只执行一次表达式替换。 Option x means extended regular expression, which means whitespaces and commens are allowd in the expression. 选项x表示扩展正则表达式,这意味着表达式中允许使用空格和共数。 Option p denotes POSIX mode, in which newlines are treated as normal character (matches with dots). 选项p表示POSIX模式,其中换行被视为普通字符(与点匹配)。

The %r/STRING/ is the another form of the regular expression. %r / STRING /是正则表达式的另一种形式。

 ^ beginning of a line or string $ end of a line or string . any character except newline \\w word character[0-9A-Za-z_] \\W non-word character \\s whitespace character[ \\t\\n\\r\\f] \\S non-whitespace character \\d digit, same as[0-9] \\D non-digit \\A beginning of a string \\Z end of a string, or before newline at the end \\z end of a string \\b word boundary(outside[]only) \\B non-word boundary \\b backspace(0x08)(inside[]only) [ ] any single character of set * 0 or more previous regular expression *? 0 or more previous regular expression(non greedy) + 1 or more previous regular expression +? 1 or more previous regular expression(non greedy) {m,n} at least m but most n previous regular expression {m,n}? at least m but most n previous regular expression(non greedy) ? 0 or 1 previous regular expression | alternation ( ) grouping regular expressions (?# ) comment (?: ) grouping without backreferences (?= ) zero-width positive look-ahead assertion (?! ) zero-width negative look-ahead assertion (?ix-ix) turns on (or off) `i' and `x' options within regular expression. 

These modifiers are localized inside an enclosing group (if any). 这些修饰符位于封闭组(如果有)内。 (?ix-ix: ) turns on (or off) i' and x' options within this non-capturing group. (?ix-ix :)在此非捕获组中打开(或关闭) i' and x'选项。

Backslash notation and expression substitution available in regular expressions. 正则表达式中提供反斜杠表示法和表达式替换。

Good luck! 祝好运!

Zenspider's Quickref contains a section explaining which escape sequences can be used in regexen and one listing the pseudo variables that get set by a regexp match. Zenspider的Quickref包含一个部分,说明可以在regexen中使用哪些转义序列,以及一个列出由regexp匹配设置的伪变量的部分 In the second argument to gsub you simply write the name of the variable with a backslash instead of a $ and it will be replaced with the value of that variable after applying the regexp. 在gsub的第二个参数中,您只需使用反斜杠而不是$来编写变量的名称,并在应用regexp后将其替换为该变量的值。 If you use a double quoted string, you need to use two backslashes. 如果使用双引号字符串,则需要使用两个反斜杠。

When using the block-form of gsub you can simply use the variables directly. 使用gsub的块形式时,您可以直接使用变量。 If you return a string containing eg \\1 from the block, that will not be replaced with $1. 如果从块中返回包含例如\\ 1的字符串,则不会被$ 1替换。 That only happens when using the two-argument form. 只有在使用双参数形式时才会发生这种情况。

If you use block in sub/gsub you can access to the groups like that : 如果你在sub / gsub中使用block,你可以访问这样的组:

>> rx = /(ab(cd)ef)/
>> s = "-abcdef-abcdef"
>> s.gsub(rx) { $2 }
=> "cdgh-cdghi"

对于Ruby 1.9的Oniguruma有正则表达式的一个良好的文档在这里

gsub is also a string substitution function within the LUA language. gsub也是LUA语言中的字符串替换函数。

Within the LUA regex language %u represents the Upper Case character class. LUA正则表达式语言中,%u表示大写字符类。 ie It will match all upper case letters. 即它将匹配所有大写字母。 Similarly %l will match lower case. 同样,%l将匹配小写。

LUA Regex Class Patterns LUA正则表达式模式

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM