[英]How can I search for a word using Ruby?
我有一個像oferson of interest
的節目的名字。
在我的代碼中,我試圖將其拆分為單個單詞,然后將每個單詞的第一個字母首字母大寫,然后將它們重新組合在一起,並在每個單詞之間添加一個空格,然后變成: Oferson Of Interest
。 然后,我想搜索單詞Of
並將其替換為小寫字母。
我似乎無法弄清楚的問題是,在程序結束時,我得到了我所不想要oferson of Interest
。 我只是希望“ of”一詞為小寫字母,而不是“ Oferson”一詞的首字母,簡單地說,就是我希望輸出Oferson of Interest
而不是oferson of Interest
。
如何不為句子中字母“ o”和“ f”的每個實例搜索單詞“ of”?
mine = 'oferson of interest'.split(' ').map {|w| w.capitalize }.join(' ')
if mine.include? "Of"
mine.gsub!(/Of/, 'of')
else
puts 'noting;'
end
puts mine
最簡單的答案是在正則表達式中使用單詞邊界:
str = "oferson of interest".split.collect(&:capitalize).join(" ")
str.gsub!(/\bOf\b/i, 'of')
# => Oferson of Interest
您正在處理“ 停用詞 ”:出於某種原因您不想處理的詞。 建立您要忽略的停用詞列表,並將每個詞與它們進行比較,以查看是否要對其進行進一步處理:
require 'set'
STOPWORDS = %w[a for is of the to].to_set
TEXT = [
'A stitch in time saves nine',
'The quick brown fox jumped over the lazy dog',
'Now is the time for all good men to come to the aid of their country'
]
TEXT.each do |text|
puts text.split.map{ |w|
STOPWORDS.include?(w.downcase) ? w.downcase : w.capitalize
}.join(' ')
end
# >> a Stitch In Time Saves Nine
# >> the Quick Brown Fox Jumped Over the Lazy Dog
# >> Now is the Time for All Good Men to Come to the Aid of Their Country
這是一個簡單的示例,但顯示了基礎知識。 在現實生活中,您需要處理標點符號,例如帶連字符的單詞。
我使用了Set,因為隨着停用詞列表的增加,它非常快。 它類似於哈希,所以檢查比使用include?
更快include?
在數組上:
require 'set'
require 'fruity'
LETTER_ARRAY = ('a' .. 'z').to_a
LETTER_SET = LETTER_ARRAY.to_set
compare do
array {LETTER_ARRAY.include?('0') }
set { LETTER_SET.include?('0') }
end
# >> Running each test 16384 times. Test will take about 2 seconds.
# >> set is faster than array by 10x ± 0.1
當您想要保護結果字符串的第一個字母時,它會變得更加有趣,但是一個簡單的竅門是,如果重要的話,僅將該字母強制轉換為大寫:
require 'set'
STOPWORDS = %w[a for is of the to].to_set
TEXT = [
'A stitch in time saves nine',
'The quick brown fox jumped over the lazy dog',
'Now is the time for all good men to come to the aid of their country'
]
TEXT.each do |text|
str = text.split.map{ |w|
STOPWORDS.include?(w.downcase) ? w.downcase : w.capitalize
}.join(' ')
str[0] = str[0].upcase
puts str
end
# >> A Stitch In Time Saves Nine
# >> The Quick Brown Fox Jumped Over the Lazy Dog
# >> Now is the Time for All Good Men to Come to the Aid of Their Country
對於正則表達式來說,這不是一件好事,除非您要處理非常一致的文本模式。 由於您正在處理電視節目的名稱,因此很有可能不會找到太多的一致性,而且模式的復雜性會迅速增加。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.