[英]Split a string delimited by a list of substrings
我有類似的數據:
str = "CODEA text for first item CODEB text for next item CODEB2 some"\
"more text CODEC yet more text"
和清單:
arr = ["CODEA", "CODEB", "CODEB2", "CODEC", ... ]
我想將此字符串分成一個哈希。 哈希的密鑰將CODEA
, CODEB
等哈希值將是下面的文字,直到下一個代碼。 輸出應如下所示:
"CODEA" => "text for first item",
"CODEB" => "text for next item",
"CODEB2" => "some more text",
"CODEC" => "yet more text"
我們得到了一個字符串和一個數組。
str = "CODEA text for first item CODEB text for next item " +
"CODEB2 some more text CODEC yet more text"
arr= %w|CODEC CODEB2 CODEA CODEB|
#=> ["CODEC", "CODEB2", "CODEA", "CODEB"]
這是獲得所需哈希的一種方法。
str.split.
slice_before { |word| arr.include?(word) }.
map { |word, *rest| [word, rest.join(' ')] }.
to_h
#=> {"CODEA" =>"text for first item",
# "CODEB" =>"text for next item",
# "CODEB2"=>"some more text",
# "CODEC" =>"yet more text"}
步驟如下。
a = str.split
#=> ["CODEA", "text", "for", "first", "item", "CODEB",
# "text", "for", "next", "item", "CODEB2", "some",
# "more", "text", "CODEC", "yet", "more", "text"]
b = a.slice_before { |word| arr.include?(word) }
#=> #<Enumerator:
# #<Enumerator::Generator:0x00005cbdec2b5eb0>:each>
我們可以看到該枚舉器將生成的(4)個元素(數組),並將其轉換為數組傳遞給each_with_object
。
b.to_a
#=> [["CODEA", "text", "for", "first", "item"],
# ["CODEB", "text", "for", "next", "item"],
# ["CODEB2", "some", "more", "text"],
# ["CODEC", "yet", "more", "text"]]
繼續,
c = b.map { |word, *rest| [word, rest.join(' ')] }
#=> [["CODEA", ["text for first item"]],
# ["CODEB", ["text for next item"]],
# ["CODEB2", ["some more text"]],
# ["CODEC", ["yet more text"]]]
c.to_h
#=> {"CODEA"=>"text for first item",
# "CODEB"=>"text for next item",
# "CODEB2"=>"some more text",
# "CODEC"=>"yet more text"}
以下也許是一種更好的方法。
str.split.
slice_before { |word| arr.include?(word) }.
each_with_object({}) { |(word, *rest),h|
h[word] = rest.join(' ') }
當我還是個孩子時,可以按照以下步驟進行操作。
last_word = ''
str.split.each_with_object({}) do |word,h|
if arr.include?(word)
h[word]=''
last_word = word
else
h[last_word] << ' ' unless h[last_word].empty?
h[last_word] << word
end
end
必須將last_word
設置為塊外的任何內容。
碼:
str = 'CODEA text for first item CODEB text for next item ' +
'CODEB2 some more text CODEC yet more text'
puts Hash[str.scan(/(CODE\S*) (.*?(?= CODE|$))/)]
結果:
{"CODEA"=>"text for first item", "CODEB"=>"text for next item", "CODEB2"=>"some more text", "CODEC"=>"yet more text"}
另外一個選項。
string.split.reverse
.slice_when { |word| word.start_with? 'CODE' }
.map{ |(*v, k)| [k, v.reverse.join(' ')] }.to_h
Enumerator#slice_when
,在這種情況下,返回以下數組:
[["text", "more", "yet", "CODEC"], ["text", "more", "some", "CODEB2"], ["item", "next", "for", "text", "CODEB"], ["item", "first", "for", "text", "CODEA"]]
然后,將數組映射為構建所需的哈希以獲取結果(我沒有反轉Hash):
#=> {"CODEC"=>"yet more text", "CODEB2"=>"some more text", "CODEB"=>"text for next item", "CODEA"=>"text for first item"}
在String#split
的模式中添加括號可讓您同時獲取分隔符和字段。
str.split(/(#{Regexp.union(*arr)})/).drop(1).each_slice(2).to_h
# =>
# {
# "CODEA"=>" text for first item ",
# "CODEB"=>"2 somemore text ",
# "CODEC"=>" yet more text"
# }
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.