简体   繁体   English

使用Ruby中的正则表达式捕获具有多个匹配的多行字符串中的组

[英]Capture a group in a multi-line string with multiple matches using regular expression in Ruby

I'm trying to capture the String '1611650547*42' in the multiple line String bellow. 我正在尝试捕获多行String String中的String'1611650547 * 42'。

myString = "'/absfucate/wait.do;cohrAwdSessID=jbreW9yA8R0xh9b?
obfuscateId=jbreW9yA8R0xh9b&checksum=1611650547*42&tObfuscate=null&
tSession_1DS=null&obsfuscate3=DeptNLI8261138&
dispatchMethod=obfuscate'+ '&poll= 
8R0xh9b&checksum=1611650547*42&tSession=null'"

I'm using the the code bellow. 我正在使用下面的代码。 And it captures two groups. 它捕获了两组。 When 什么时候

/checksum=(?<checksum>\d*\*\d*)/m.match(myString)['checksum']

The capturing group checksum works for a string with one match, but when using multiple matches are found it throws the following error 捕获组checksum适用于具有一个匹配的字符串,但是当发现使用多个匹配时,它会引发以下错误

undefined method `[]' for nil:NilClass (NoMethodError) 未定义的方法`[]'为nil:NilClass(NoMethodError)

It's hard to be 100% sure about your input and the criteria revolving around the * . 很难100%确定您的输入和围绕*的标准。 How about trying something a bit more specific (Ruby 2): 如何尝试更具体的东西(Ruby 2):

if myString =~ /(?m)checksum=\K\d*\*\d*/
    checksum = $&

What does the regex mean? 正则表达式意味着什么?

  • Use these options for the whole regular expression (?m) 将这些选项用于整个正则表达式(?m)
    • &Dot matches line breaks m &Dot匹配换行符m
  • Match the character string “checksum=” literally (case sensitive) checksum= 匹配字符串“checksum =”字面(区分大小写) checksum=
  • Keep the text matched so far out of the overall regex match \\K 保持文本匹配到目前为止整个正则表达式匹配\\K
  • Match a single character that is a “digit” (ASCII 0–9 only) \\d* 匹配单个字符“数字”(仅限ASCII 0-9) \\d*
    • Between zero and unlimited times, as many times as possible, giving back as needed (greedy) * 在零和无限次之间,尽可能多次,根据需要回馈(贪婪) *
  • Match the character “*” literally \\* 字面匹配字符“*” \\*
  • Match a single character that is a “digit” (ASCII 0–9 only) \\d* 匹配单个字符“数字”(仅限ASCII 0-9) \\d*
    • Between zero and unlimited times, as many times as possible, giving back as needed (greedy) * 在零和无限次之间,尽可能多次,根据需要回馈(贪婪) *
myString = "'/absfucate/wait.do;cohrAwdSessID=jbreW9yA8R0xh9b?
obfuscateId=jbreW9yA8R0xh9b&checksum=1611650547*42&tObfuscate=null&
tSession_1DS=null&obsfuscate3=DeptNLI8261138&
dispatchMethod=obfuscate'+ '&poll= 
8R0xh9b&checksum=1611650547*42&tSession=null'"

myString.scan(/checksum=[^&]+/) # => ["checksum=1611650547*42", "checksum=1611650547*42"]

Since your string contains two, and you don't say which one you want, pick one or the other, then continue processing: 由于您的字符串包含两个,并且您没有说出您想要的字符串,请选择其中一个,然后继续处理:

myString.scan(/checksum=[^&]+/).first.split('=').last # => "1611650547*42"

Basically /checksum=[^&]+/ means: Find "checksum=" then the text following it until the next & . 基本上/checksum=[^&]+/表示:查找"checksum="然后是文本,直到下一个& Once I have those strings it's easy to split them on = . 一旦我有了这些字符串,很容易将它们分开=

Regex aren't magic bullets, and will make your life more and more miserable the longer and more complex they become, so use them carefully and sparingly. 正则表达式不是魔法子弹,它会使你的生活变得越来越悲惨,越长越复杂,所以要小心谨慎地使用它们。 Rather than try to process the entire line in one pattern, scan lets me use a small pattern to locate only what I want, and it handles the task of looping through the entire string. scan让我使用一个小模式来定位我想要的东西,而不是尝试在一个模式中处理整行,而是处理循环整个字符串的任务。

If I was only after one of the occurrences, I'd use a pattern and match . 如果我只是在其中一次事件之后,我会使用模式和match These are equivalent to what you were after, only they're more succinct: 这些相当于你所追求的,只是它们更简洁:

myString.match(/checksum=(?<checksum>[^&]+)/m)[:checksum] # => "1611650547*42"
myString.match(/checksum=(?<checksum>[\d*]+)/m)[:checksum] # => "1611650547*42"

For readability I'd use the pattern as the parameter for match , rather than chain match to the m flag. 为了便于阅读,我将模式用作match的参数,而不是match m标志的链match

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM