i am try to parse some data and meet trouble with clean a symbol. I knew that this is just a "space" but i realy got trouble to clean it from string my code:
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
page = agent.get('my_page.hmtl')
price = page.search('#product_buy .price').text.to_s.gsub(/\s+/, "").gsub(" ","").gsub(" ", "")
puts price
And as result i always got "4 162" - with dat spaces. Don't know what to do. Help please who meet this issue previously. Thank you
HTML escape codes don't mean anything to Ruby's regex engine. Looking for " "
will look for those literal characters, not a thin space. Instead, versions of Ruby >= 1.8 support Unicode in strings, meaning that you can use the Unicode code point corresponding to a thin space to make your substitution. The Unicode code point for a thin space is 0x2009
, meaning that you can reference it in a Ruby string as \
.
Additionally, instead of calling some_string.gsub('some_string', '')
, you can just call some_string.delete('some_string')
.
Note that this isn't appropriate for all situations, because delete
removes all instances of all characters appearing in the intersection of its arguments, while gsub
will remove only segments matching the pattern provided. For example, 'hellohi'.gsub('hello', '') == "hi"
, while 'hellohi'.delete('hello') == 'i')
.
In your specific case, I'd use something like:
price = page.search('#product_buy .price').text.delete('\u2009\s')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.