简体繁体 English

如何在R中刮取javascript表？

[英]How to scrape javascript table in R?

原文 2016-05-23 23:08:56 8 1 javascript/ r/ web-scraping

I want to scrape a table from the citibike : https://s3.amazonaws.com/tripdata/index.html 我想从citibike中刮一张桌子： https ：//s3.amazonaws.com/tripdata/index.html

My goal is to get the urls of the zip files all at once, instead of manually type all the dates and downloading one at each time. 我的目标是一次性获取zip文件的URL，而不是手动键入所有日期并每次下载一个。 Since the webpage is updated monthly, every time I run the function, I want be able to get all the up-to-date data files. 由于网页每月更新一次，每次运行该功能时，我都希望能够获取所有最新的数据文件。

I first tried to use Rvest and XML packages and then realized that the webpage contains both the html and a table that's generated by a javascript function. 我首先尝试使用Rvest和XML包，然后意识到网页包含html和由javascript函数生成的表。 That's where the problem was. 这就是问题所在。

Really appreciate any help and please let me know if I could provide further information. 非常感谢任何帮助，如果我能提供更多信息，请告诉我。

1 个解决方案

If I go to https://s3.amazonaws.com/tripdata/ (just the root, no index.html ) I get a simple XML file. 如果我转到https://s3.amazonaws.com/tripdata/ （只是root，没有index.html ），我会得到一个简单的XML文件。 The relevant element is Key (uppercase K, lowercase e,y) if you want to parse the XML but I would just search the plain text, that is: ignore the XML, treat it like a simple text file, get every string between <Key> and </Key> treat that as the filename that it is and prefix https://s3.amazonaws.com/tripdata/ to get it. 如果要解析XML，则相关元素为Key （大写K，小写e，y），但我只搜索纯文本，即：忽略XML，将其视为简单的文本文件，获取<Key>和</Key>将其视为文件名，并使用前缀https://s3.amazonaws.com/tripdata/来获取它。

The first entry is all together (170 MB) as it seems, so you might be ok with that alone. 第一个条目就像它看起来一样（170 MB），所以你可能只对它有好处。

如何抓取由 javascript 填充的表？ - How to scrape a table that is populated by javascript?

如何使用 Javascript 中的表格构建网站 - How to scrape website with table build in Javascript

如何从网站上抓取 JavaScript 表到 dataframe？ - How to scrape JavaScript table from website to dataframe?

使用R从可能填充有javascript的表中抓取数据 - Using R to scrape data from a table populated possibly with javascript

使用R中的JavaScript刮取页面 - Scrape a page with JavaScript from R

如何使用Python（最好是pandas）从Javascript表中抓取数据？ - How to use Python (preferably pandas) to scrape data from Javascript table?

使用R将字段添加到在线表单并刮取生成的javascript创建表 - Using R to add field to online form and scrape resulting javascript created table

如何通过在检查器中操作 Javascript 来抓取表格？页面只显示当天的数据，但我想回到过去并抓取 - How to scrape a table by manipulating Javascript in inspector? Page only reveals current day's data but I want to go back in time and scrape

JS Puppeteer - 如何刮桌子 - JS Puppeteer - How to scrape a table

如何用Puppeteer擦桌子？ - How to scrape a table using Puppeteer?

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何抓取由 javascript 填充的表？ - How to scrape a table that is populated by javascript? 如何使用 Javascript 中的表格构建网站 - How to scrape website with table build in Javascript 如何从网站上抓取 JavaScript 表到 dataframe？ - How to scrape JavaScript table from website to dataframe? 使用R从可能填充有javascript的表中抓取数据 - Using R to scrape data from a table populated possibly with javascript 使用R中的JavaScript刮取页面 - Scrape a page with JavaScript from R 如何使用Python（最好是pandas）从Javascript表中抓取数据？ - How to use Python (preferably pandas) to scrape data from Javascript table? 使用R将字段添加到在线表单并刮取生成的javascript创建表 - Using R to add field to online form and scrape resulting javascript created table 如何通过在检查器中操作 Javascript 来抓取表格？页面只显示当天的数据，但我想回到过去并抓取 - How to scrape a table by manipulating Javascript in inspector? Page only reveals current day's data but I want to go back in time and scrape JS Puppeteer - 如何刮桌子 - JS Puppeteer - How to scrape a table 如何用Puppeteer擦桌子？ - How to scrape a table using Puppeteer?

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM