正則表達式 - 將消息分成組

Question

我想將此消息分成幾組：

[Rule] 'Server - update repository' [Source] 10.10.10.10 [User] _Server [Content] HTTP GET http://example.com

預期結果：

Group1: [Rule] 'Server - update repository'
Group2: [Source] 10.10.10.10
Group3: [User] _Server
Group4: [Content] HTTP GET http://example.com

不一定是4組，有時可以少/多。 我嘗試構建的模式：

(\(^\[\w+\].*\)){0,}

Answer 1

我會這樣做：

string = "[Rule] 'Server - update repository' [Source] 10.10.10.10 [User] _Server [Content] HTTP GET http://example.com"

regexp = /\[?[^\[]+/
string.scan(regexp)
#=> ["[Rule] 'Server - update repository' ", "[Source] 10.10.10.10 ", "[User] _Server ", "[Content] HTTP GET http://example.com"]

或者當您希望返回 hash 時：

regexp = /\[(\w+)\]\s+([^\[]+)/
string.scan(regexp).to_h
#=> { "Rule" => "'Server - update repository' ", "Source" => "10.10.10.10 ", "User" => "_Server ", "Content" => "HTTP GET http://example.com" }

Answer 2

如果組文本中沒有[這可能有效。

str = "[Rule] 'Server - update repository' [Source] 10.10.10.10 [User] _Server [Content] HTTP GET http://example.com"

str.split("[").each_with_index {|c, i| puts "Group #{i}: [#{c}" if i > 0}
Group 1: [Rule] 'Server - update repository' 
Group 2: [Source] 10.10.10.10                    
Group 3: [User] _Server                          
Group 4: [Content] HTTP GET http://example.com

Answer 3

您也可以使用String#split 。

str = "[Rule] 'Server - update repository' [Source] 10.10.10.10 [User] _Server [Content] HTTP GET http://example.com"

str.split(/ +(?=\[)/)
  #=> ["[Rule] 'Server - update repository'",
  #    "[Source] 10.10.10.10",
  #    "[User] _Server",
  #    "[Content] HTTP GET http://example.com"]

字符串在一個或多個空格后跟一個左方括號分開。 (?=\[)是一個積極的前瞻。

如果你想創建一個 hash 鍵:Group1 ， :Group2 ，等等，你可以寫

arr = str.split(/ +(?=\[)/)

arr.each_index.with_object({}) do |i,h|
  h.update("Group#{i+1}".to_sym => arr[i])
end
  #=> {:Group1=>"[Rule] 'Server - update repository'",
  #    :Group2=>"[Source] 10.10.10.10",
  #    :Group3=>"[User] _Server",
  #    :Group4=>"[Content] HTTP GET http://example.com"}

根據這里的要求是另一種選擇。

RGX = /\[([A-Z][a-z]+)\] +([^\[\]]+[^ \[\]])/

str.gsub(RGX).with_object({}) { |_,h| h[$1] = $2 }
  #=> {"Rule"=>"'Server - update repository'",
  #    "Source"=>"10.10.10.10",
  #    "User"=>"_Server",
  #    "Content"=>"HTTP GET http://example.com"}

這使用String#gsub的形式，它接受一個參數並且沒有塊，返回一個枚舉器。 這種形式很有用但很奇怪，因為它與字符串替換無關。

我們可以以自由間距模式編寫正則表達式，使其自文檔化。

/
\[          # match '['
(           # begin capture group 1
  [A-Z]     # match an uppercase letter
  [a-z]+    # match one or more lowercase letters
)           # end capture group 1
\]\ +       # match ']' followed by one or more spaces
(           # begin capture group 2
  [^\[\]]+  # match one or more chars other than '[' and ']'
  [^ \[\]]  # match one char other than ' ', '[' and ']'
)           # end capture group 2
/x          # invoke free-spacing regex definition mode

正則表達式 - 將消息分成組

問題描述

3 個解決方案

解決方案1
3 已采納 2023-01-09 13:04:24

解決方案2
3 2023-01-09 13:22:39

解決方案3
0 2023-01-09 21:59:16

正則表達式 - 將消息分成組

問題描述

3 個解決方案

解決方案1 3 已采納 2023-01-09 13:04:24

解決方案2 3 2023-01-09 13:22:39

解決方案3 0 2023-01-09 21:59:16

解決方案1
3 已采納 2023-01-09 13:04:24

解決方案2
3 2023-01-09 13:22:39

解決方案3
0 2023-01-09 21:59:16