简体   繁体   中英

Scrape text and media from URL

I am looking for some helpful gems in ruby for scraping purposes. Basically, I am looking to be able to scrape the main body from the page. That is, only the main body and its media (images). No sidebar or footer or navbar type of stuff.

I know scraping requires a lot of specificities like knowing the classes and ids and so on. So I am wondering if there is a tool that does something like this?

A good example would be the "Reader View Available" option in safari when on iOS. Where it just shows the raw content from the page. With required headers and paragraphs.

Use Nokogiri

And you can also use Css Selector Gadget to find your classes. This should be helpful to find proper header and body classes or id's.

Reader View doesn't save bandwidth

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM