簡體   English   中英

如何使用Ruby搜索單詞?

[英]How can I search for a word using Ruby?

我有一個像oferson of interest的節目的名字。

在我的代碼中,我試圖將其拆分為單個單詞,然后將每個單詞的第一個字母首字母大寫,然后將它們重新組合在一起,並在每個單詞之間添加一個空格,然后變成: Oferson Of Interest 然后,我想搜索單詞Of並將其替換為小寫字母。

我似乎無法弄清楚的問題是,在程序結束時,我得到了我所不想要oferson of Interest 我只是希望“ of”一詞為小寫字母,而不是“ Oferson”一詞的首字母,簡單地說,就是我希望輸出Oferson of Interest而不是oferson of Interest

如何不為句子中字母“ o”和“ f”的每個實例搜索單詞“ of”?

mine = 'oferson of interest'.split(' ').map {|w| w.capitalize }.join(' ')
 if mine.include? "Of"
   mine.gsub!(/Of/, 'of')
else
  puts 'noting;'
end

puts mine

最簡單的答案是在正則表達式中使用單詞邊界:

str = "oferson of interest".split.collect(&:capitalize).join(" ")
str.gsub!(/\bOf\b/i, 'of')
# => Oferson of Interest

您正在處理“ 停用詞 ”:出於某種原因您不想處理的詞。 建立您要忽略的停用詞列表,並將每個詞與它們進行比較,以查看是否要對其進行進一步處理:

require 'set'

STOPWORDS = %w[a for is of the to].to_set
TEXT = [
  'A stitch in time saves nine',
  'The quick brown fox jumped over the lazy dog',
  'Now is the time for all good men to come to the aid of their country'
]

TEXT.each do |text|
  puts text.split.map{ |w|
    STOPWORDS.include?(w.downcase) ? w.downcase : w.capitalize
  }.join(' ')
end
# >> a Stitch In Time Saves Nine
# >> the Quick Brown Fox Jumped Over the Lazy Dog
# >> Now is the Time for All Good Men to Come to the Aid of Their Country

這是一個簡單的示例,但顯示了基礎知識。 在現實生活中,您需要處理標點符號,例如帶連字符的單詞。

我使用了Set,因為隨着停用詞列表的增加,它非常快。 它類似於哈希,所以檢查比使用include?更快include? 在數組上:

require 'set'
require 'fruity'

LETTER_ARRAY = ('a' .. 'z').to_a
LETTER_SET = LETTER_ARRAY.to_set

compare do

  array {LETTER_ARRAY.include?('0') }
  set { LETTER_SET.include?('0') }
end
# >> Running each test 16384 times. Test will take about 2 seconds.
# >> set is faster than array by 10x ± 0.1

當您想要保護結果字符串的第一個字母時,它會變得更加有趣,但是一個簡單的竅門是,如果重要的話,僅將該字母強制轉換為大寫:

require 'set'

STOPWORDS = %w[a for is of the to].to_set
TEXT = [
  'A stitch in time saves nine',
  'The quick brown fox jumped over the lazy dog',
  'Now is the time for all good men to come to the aid of their country'
]

TEXT.each do |text|
  str = text.split.map{ |w|
    STOPWORDS.include?(w.downcase) ? w.downcase : w.capitalize
  }.join(' ')
  str[0] = str[0].upcase
  puts str
end
# >> A Stitch In Time Saves Nine
# >> The Quick Brown Fox Jumped Over the Lazy Dog
# >> Now is the Time for All Good Men to Come to the Aid of Their Country

對於正則表達式來說,這不是一件好事,除非您要處理非常一致的文本模式。 由於您正在處理電視節目的名稱,因此很有可能不會找到太多的一致性,而且模式的復雜性會迅速增加。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM