简体繁体 English

从网址中抓取文字和媒体

[英]Scrape text and media from URL

原文 2015-01-10 00:25:55 7 1 javascript/ html/ css/ ruby/ web-scraping

I am looking for some helpful gems in ruby for scraping purposes. 我正在寻找红宝石中一些有用的宝石以进行刮削。 Basically, I am looking to be able to scrape the main body from the page. 基本上，我希望能够从页面上刮掉主体。 That is, only the main body and its media (images). 也就是说，只有主体及其媒体（图像）。 No sidebar or footer or navbar type of stuff. 没有侧边栏，页脚或导航栏类型的东西。

I know scraping requires a lot of specificities like knowing the classes and ids and so on. 我知道抓取需要很多特殊性，例如了解类和ID等。 So I am wondering if there is a tool that does something like this? 所以我想知道是否有一种工具可以执行这样的操作？

A good example would be the "Reader View Available" option in safari when on iOS. 一个很好的例子是在iOS上的safari中的“ Reader View Available”选项。 Where it just shows the raw content from the page. 它仅显示页面中的原始内容。 With required headers and paragraphs. 具有必需的标题和段落。

1 个解决方案

Use Nokogiri 使用Nokogiri

And you can also use Css Selector Gadget to find your classes. 您还可以使用Css Selector小工具查找您的课程。 This should be helpful to find proper header and body classes or id's. 这将有助于找到正确的标头和正文类或ID。

Reader View doesn't save bandwidth 阅读器视图无法节省带宽

从Iframe抓取文本 - Scrape Text From Iframe

使用jQuery从外部XML网址中抓取信息 - Scrape info from external XML url with jQuery

使用 javascript 从亚马逊 URL 抓取 ASIN - scrape ASIN from amazon URL using javascript

使用javascript从网址中抓取ID - scrape id from url using javascript

无法从URL抓取特定数据 - Unable to scrape particular data from a URL

如何从使用 javascript 生成的工具提示中抓取文本 - How to scrape text from tooltips generated with javascript

从网站上刮掉用javascript编写的文本 - Scrape a text that was written by javascript from website

从复杂的 DOM 结构中抓取文本 - Scrape text from a complex DOM structure

如何从JW Player获取媒体URL？ - How to get media url from JW Player?

如何使用jQuery从URL中抓取数据列表 - how to scrape a list of data from a url using jquery

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从Iframe抓取文本 - Scrape Text From Iframe 使用jQuery从外部XML网址中抓取信息 - Scrape info from external XML url with jQuery 使用 javascript 从亚马逊 URL 抓取 ASIN - scrape ASIN from amazon URL using javascript 使用javascript从网址中抓取ID - scrape id from url using javascript 无法从URL抓取特定数据 - Unable to scrape particular data from a URL 如何从使用 javascript 生成的工具提示中抓取文本 - How to scrape text from tooltips generated with javascript 从网站上刮掉用javascript编写的文本 - Scrape a text that was written by javascript from website 从复杂的 DOM 结构中抓取文本 - Scrape text from a complex DOM structure 如何从JW Player获取媒体URL？ - How to get media url from JW Player? 如何使用jQuery从URL中抓取数据列表 - how to scrape a list of data from a url using jquery

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM