I am using nokogiri in my rails 4 app to scrape images from websites and some of them give me unexpected '$' after '' errors.
For instance, here is one sample image url output:
<img src="http://x.example.com/images/detail/ln9502/1_ln-9502---
grh_375.jpg" alt="" style="display: block;">
I suspect it is the line break that is giving me trouble?
Here is another:
<img class="abc" src="http://xxx.example.com/is/image/Sample/503508739_1?$sample_size$">
I suspect it is the dollar signs giving me issues here.
Here is what I have in one of my controllers that is saving the image:
item_imageurl = page.search(library.image_selector).first.attribute('src').value(/(.|\n|\r)*/).to_s
Where I have items that belong to a library and I set the css selector in each library. Any ideas on what regex I could use to ignore line breaks and dollar signs, unless there's a simpler solution?
You can remove new lines and whitespace from a string with .gsub
.
item_imageurl = page.search(library.image_selector).first.attribute('src').value().to_s.gsub(/[\n ]/, "")
I'm assuming ...attribute('src').value()
returns the contents of the src
tag.
For the record, your regex matches the last character of the string. You might want to check out http://regex101.com/ for texting your regular expressions.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.