简体   繁体   中英

Remove   from Ruby String

i am try to parse some data and meet trouble with clean a symbol. I knew that this is just a "space" but i realy got trouble to clean it from string my code:

require 'rubygems'
require 'mechanize'

agent = Mechanize.new
page = agent.get('my_page.hmtl')
price = page.search('#product_buy .price').text.to_s.gsub(/\s+/, "").gsub(" ","").gsub(" ", "")
puts price

And as result i always got "4 162" - with dat spaces. Don't know what to do. Help please who meet this issue previously. Thank you

HTML escape codes don't mean anything to Ruby's regex engine. Looking for " " will look for those literal characters, not a thin space. Instead, versions of Ruby >= 1.8 support Unicode in strings, meaning that you can use the Unicode code point corresponding to a thin space to make your substitution. The Unicode code point for a thin space is 0x2009 , meaning that you can reference it in a Ruby string as \  .

Additionally, instead of calling some_string.gsub('some_string', '') , you can just call some_string.delete('some_string') .

Note that this isn't appropriate for all situations, because delete removes all instances of all characters appearing in the intersection of its arguments, while gsub will remove only segments matching the pattern provided. For example, 'hellohi'.gsub('hello', '') == "hi" , while 'hellohi'.delete('hello') == 'i') .

In your specific case, I'd use something like:

price = page.search('#product_buy .price').text.delete('\u2009\s')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM