简体   繁体   中英

Regex - Split message into groups

I want to split this message into groups:

[Rule] 'Server - update repository' [Source] 10.10.10.10 [User] _Server [Content] HTTP GET http://example.com

Expected result:

Group1: [Rule] 'Server - update repository'
Group2: [Source] 10.10.10.10
Group3: [User] _Server
Group4: [Content] HTTP GET http://example.com

It does not have to be 4 groups, sometimes it can be less / more. Pattern I tried to built:

(\(^\[\w+\].*\)){0,}

I would do it like this:

string = "[Rule] 'Server - update repository' [Source] 10.10.10.10 [User] _Server [Content] HTTP GET http://example.com"

regexp = /\[?[^\[]+/
string.scan(regexp)
#=> ["[Rule] 'Server - update repository' ", "[Source] 10.10.10.10 ", "[User] _Server ", "[Content] HTTP GET http://example.com"]

Or when you prefer a hash to be returned:

regexp = /\[(\w+)\]\s+([^\[]+)/
string.scan(regexp).to_h
#=> { "Rule" => "'Server - update repository' ", "Source" => "10.10.10.10 ", "User" => "_Server ", "Content" => "HTTP GET http://example.com" }

If there will be no [ in the group text this might work.

str = "[Rule] 'Server - update repository' [Source] 10.10.10.10 [User] _Server [Content] HTTP GET http://example.com"
str.split("[").each_with_index {|c, i| puts "Group #{i}: [#{c}" if i > 0}
Group 1: [Rule] 'Server - update repository' 
Group 2: [Source] 10.10.10.10                    
Group 3: [User] _Server                          
Group 4: [Content] HTTP GET http://example.com

You can also use String#split .

str = "[Rule] 'Server - update repository' [Source] 10.10.10.10 [User] _Server [Content] HTTP GET http://example.com"
str.split(/ +(?=\[)/)
  #=> ["[Rule] 'Server - update repository'",
  #    "[Source] 10.10.10.10",
  #    "[User] _Server",
  #    "[Content] HTTP GET http://example.com"]

The string is split on one or more spaces followed by a left bracket. (?=\[) is a positive lookahead .


If you wish to create a hash with keys :Group1 , :Group2 , and so on, you could write

arr = str.split(/ +(?=\[)/)
arr.each_index.with_object({}) do |i,h|
  h.update("Group#{i+1}".to_sym => arr[i])
end
  #=> {:Group1=>"[Rule] 'Server - update repository'",
  #    :Group2=>"[Source] 10.10.10.10",
  #    :Group3=>"[User] _Server",
  #    :Group4=>"[Content] HTTP GET http://example.com"}

Depending on requirements here is another option.

RGX = /\[([A-Z][a-z]+)\] +([^\[\]]+[^ \[\]])/
str.gsub(RGX).with_object({}) { |_,h| h[$1] = $2 }
  #=> {"Rule"=>"'Server - update repository'",
  #    "Source"=>"10.10.10.10",
  #    "User"=>"_Server",
  #    "Content"=>"HTTP GET http://example.com"}

This uses the form of String#gsub that takes a single argument and has no block, returning an enumerator. This form is useful but odd, as it has nothing to do with string replacement.

We can write the regular expression in free spacing mode to make it self-documenting.

/
\[          # match '['
(           # begin capture group 1
  [A-Z]     # match an uppercase letter
  [a-z]+    # match one or more lowercase letters
)           # end capture group 1
\]\ +       # match ']' followed by one or more spaces
(           # begin capture group 2
  [^\[\]]+  # match one or more chars other than '[' and ']'
  [^ \[\]]  # match one char other than ' ', '[' and ']'
)           # end capture group 2
/x          # invoke free-spacing regex definition mode

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM