简体   繁体   English

分割由子字符串列表分隔的字符串

[英]Split a string delimited by a list of substrings

I have data like: 我有类似的数据:

str = "CODEA text for first item CODEB text for next item CODEB2 some"\
"more text CODEC yet more text"

and a list: 和清单:

arr = ["CODEA", "CODEB", "CODEB2", "CODEC", ... ]

I want to divide this string into a hash. 我想将此字符串分成一个哈希。 The keys of the hash will be CODEA , CODEB , etc. The values of the hash will be the text that follows, until the next CODE. 哈希的密钥将CODEACODEB等哈希值将是下面的文字,直到下一个代码。 The output should look like this: 输出应如下所示:

"CODEA" => "text for first item",
"CODEB" => "text for next item",
"CODEB2" => "some more text",
"CODEC" => "yet more text"

We are given a sting and an array. 我们得到了一个字符串和一个数组。

str = "CODEA text for first item CODEB text for next item " + 
      "CODEB2 some more text CODEC yet more text"

arr= %w|CODEC CODEB2 CODEA CODEB|
  #=> ["CODEC", "CODEB2", "CODEA", "CODEB"]     

This is one way to obtain the desired hash. 这是获得所需哈希的一种方法。

 str.split.
     slice_before { |word| arr.include?(word) }.
     map { |word, *rest| [word, rest.join(' ')] }.
     to_h
  #=> {"CODEA" =>"text for first item",
  #    "CODEB" =>"text for next item",
  #    "CODEB2"=>"some more text",
  #    "CODEC" =>"yet more text"}

See Enumerable#slice_before . 参见Enumerable#slice_before

The steps are as follows. 步骤如下。

a = str.split
  #=> ["CODEA", "text", "for", "first", "item", "CODEB",
  #    "text", "for", "next", "item", "CODEB2", "some",
  #    "more", "text", "CODEC", "yet", "more", "text"] 
b = a.slice_before { |word| arr.include?(word) }
  #=> #<Enumerator:
  #     #<Enumerator::Generator:0x00005cbdec2b5eb0>:each> 

We can see the (4) elements (arrays) that will be generated by this enumerator and passed to each_with_object by converting it to an array. 我们可以看到该枚举器将生成​​的(4)个元素(数组),并将其转换为数组传递给each_with_object

b.to_a
  #=> [["CODEA", "text", "for", "first", "item"],
  #    ["CODEB", "text", "for", "next", "item"],
  #    ["CODEB2", "some", "more", "text"],
  #    ["CODEC", "yet", "more", "text"]] 

Continuing, 继续,

c = b.map { |word, *rest| [word, rest.join(' ')] }
  #=> [["CODEA", ["text for first item"]],
  #    ["CODEB", ["text for next item"]],
  #    ["CODEB2", ["some more text"]],
  #    ["CODEC", ["yet more text"]]] 
c.to_h
  #=> {"CODEA"=>"text for first item",
  #    "CODEB"=>"text for next item",
  #    "CODEB2"=>"some more text",
  #    "CODEC"=>"yet more text"} 

The following is perhaps a better way of doing this. 以下也许是一种更好的方法。

 str.split.
     slice_before { |word| arr.include?(word) }.
     each_with_object({}) { |(word, *rest),h|
       h[word] = rest.join(' ') }

When I was a kid this might be done as follows. 当我还是个孩子时,可以按照以下步骤进行操作。

last_word = ''
str.split.each_with_object({}) do |word,h|
  if arr.include?(word)
    h[word]=''
    last_word = word
  else
    h[last_word] << ' ' unless h[last_word].empty?
    h[last_word] << word
  end     
end

last_word must be set to anything outside the block. 必须将last_word设置为块外的任何内容。

Code: 码:

str = 'CODEA text for first item CODEB text for next item ' + 
      'CODEB2 some more text CODEC yet more text'

puts Hash[str.scan(/(CODE\S*) (.*?(?= CODE|$))/)]

Result: 结果:

{"CODEA"=>"text for first item", "CODEB"=>"text for next item", "CODEB2"=>"some more text", "CODEC"=>"yet more text"}

Another option. 另外一个选项。

string.split.reverse
      .slice_when { |word| word.start_with? 'CODE' }
      .map{ |(*v, k)| [k, v.reverse.join(' ')] }.to_h

Enumerator#slice_when , in this case returns this array: Enumerator#slice_when ,在这种情况下,返回以下数组:

[["text", "more", "yet", "CODEC"], ["text", "more", "some", "CODEB2"], ["item", "next", "for", "text", "CODEB"], ["item", "first", "for", "text", "CODEA"]]

Then the array is mapped to build the required hash to get the result (I did not reversed the Hash): 然后,将数组映射为构建所需的哈希以获取结果(我没有反转Hash):

#=> {"CODEC"=>"yet more text", "CODEB2"=>"some more text", "CODEB"=>"text for next item", "CODEA"=>"text for first item"}

Adding parentheses to the pattern in String#split lets you get both the separators and the fields. String#split的模式中添加括号可让您同时获取分隔符和字段。

str.split(/(#{Regexp.union(*arr)})/).drop(1).each_slice(2).to_h
# =>
# {
#   "CODEA"=>" text for first item ",
#   "CODEB"=>"2 somemore text ",
#   "CODEC"=>" yet more text"
# }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM