I'am making a get request in ruby like;
require 'net/http'
require 'uri'
uri = URI.parse("https://www.test.com")
request = Net::HTTP::Get.new(uri)
request.content_type = "application/json"
request["Accept"] = "application/json"
req_options = {
use_ssl: uri.scheme == "https",
}
response = Net::HTTP.start(uri.hostname, uri.port, req_options) do |http|
http.request(request)
end
# response.code
response.body
This is a html source return plain text. I would like to search for some id element on this return and get its value. It seems as a crawler. but I have never written one.
For instance, there is a field like;
<div id='price'>1000€</div>
I would like to search for <div id='price'>
and get 1000€.
I can only get its index. But then do not know what should i do.
Is it possible ? or is there any other way?
Thank you
You probably want to use https://github.com/sparklemotion/nokogiri gem.
Nokogiri (鋸) is a Rubygem providing HTML, XML, SAX, and Reader parsers with XPath and CSS selector support.
require 'nokogiri'
html = <<HTML
<div id="block1">
<a href="http://google.com">link1</a>
</div>
<div id="block2">
<a href="http://stackoverflow.com">link2</a>
<a id="tips">just a bookmark</a>
</div>
HTML
doc = Nokogiri::HTML(html)
doc.css('#block1 a[href]').text
#=>link1
To modify your example:
require 'net/http'
require 'uri'
require 'nokogiri'
uri = URI.parse("https://www.example.com")
request = Net::HTTP::Get.new(uri)
request.content_type = "application/json"
request["Accept"] = "application/json"
req_options = {
use_ssl: uri.scheme == "https",
}
response = Net::HTTP.start(uri.hostname, uri.port, req_options) do |http|
http.request(request)
end
response.body
doc = Nokogiri::HTML.parse(response.body)
doc.css('p').text;
#=> "This domain is established to be used for illustrative examples in documents. You may use this\n domain in examples without prior coordination or asking for permission.More information..."
In Ruby we have Nokogiri, which lets you search documents via XPath or CSS3 selectors:
doc = Nokogiri::HTML(open("https://www.test.com"))
doc.at_css('div#price').text
or:
doc = Nokogiri::HTML response.body
doc.at_css('div#price').text
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.