简体   繁体   English

从“导出到Excel” URL读取数据到R

[英]Reading data from “Export to excel” URL into R

I am viewing consensus fantasy football projection data for individual players at this URL: 我正在以下URL上查看个人球员的共识幻想足球预测数据:

Sub-optimal projection data format 次优投影数据格式

However, in the top right corner there is an "Export" link to the same data in a cleaner format. 但是,在右上角有一个“导出”链接,以更清晰的格式指向相同的数据。 This link prompts the user to save the data to an .xls file. 此链接提示用户将数据保存到.xls文件。 This is the data that I would like to read into R. I am not sure whether I would be able to read it in directly from the URL, or have R save it to an excel file and then read it in. I have no preference, as long as I do not have to manually save each excel file, as I plan on doing this frequently. 这是我想读入R的数据。我不确定是否能够直接从URL读入数据,还是让R将其保存到excel文件中然后再读入。我没有偏好,只要我不必经常保存每个excel文件即可,因为我计划经常这样做。

My question is, how can I create an automated process in R where I enter the export to excel URL and then read that data into R? 我的问题是,如何在R中创建一个自动化流程,在其中输入到excel URL的导出内容,然后将数据读入R?

Preferred projection data format (this a prompt to save to .xls) 首选投影数据格式 (这是保存为.xls的提示)

library("gdata")
qb_url <- "http://www.fantasypros.com/nfl/projections/qb.php?export=xls&week=4&min-yes=true&max-yes=true"
download.file(qb_url, "qb.xls")

I am now trying a read.table function and skipping the first 5-6 rows, however, since the first column is full names, sometimes with suffices, it is splitting the names into three separate columns, while I want them in just one character column. 我现在正在尝试使用read.table函数,并跳过前5-6行,但是,由于第一列是全名,有时有足够的名称,因此它将名称分成三个单独的列,而我只需要一个字符柱。 I've tried StringsAsFactors = FALSE and other pieces of code, but to no avail. 我已经尝试过StringsAsFactors = FALSE和其他代码,但无济于事。 I am reading through the read.table documentation websites, but I am not able to see what I am doing wrong/missing. 我正在阅读read.table文档网站,但看不到我做错了什么。

Take a look at rvest . 看看rvest Those pages have a tag for the download: 这些页面具有下载标签:

<a id="export-xls" href="?export=xls&amp;week=4&amp;min-yes=true&amp;max-yes=true" rel="nofollow"><i class="fa fa-file-excel-o"></i> Export</a>

You can grab that with: 您可以使用以下方法来抓住它:

library(rvest)

URL <- "http://www.fantasypros.com/nfl/projections/qb.php?max-yes=true&min-yes=true"
pg <- read_html(URL)

html_attr(html_nodes(pg, "a#export-xls"), "href")

Which produces: 产生:

## [1] "?export=xls&week=4&min-yes=true&max-yes=true"

Just append it to the domain+query: 只需将其附加到domain + query即可:

http://www.fantasypros.com/nfl/projections/qb.php

and download it in any one of at least 5 ways in R code. 并以至少5种方式中的任何一种以R代码下载。 Here's a download.file() example: 这是一个download.file()示例:

dl_query <- html_attr(html_nodes(pg, "a#export-xls"), "href")

download.file(sprintf("http://www.fantasypros.com/nfl/projections/qb.php%s", dl_query),
          "filenametosaveitas.csv")

I used " .csv " since it's not really an Excel file. 我使用了“ .csv ”,因为它实际上不是Excel文件。 But, it's a fugly CSV so you'll have to massage it a bit first with read.table : 但是,这是一个丑陋的CSV,因此您必须首先使用read.table对其进行按摩:

dat <- read.table("filenametosaveitas.csv", skip=4, header=TRUE, sep="\t")

And then voilá, you have your data: 然后瞧瞧,您拥有了数据:

dplyr::glimpse(dat)

## Observations: 90
## Variables: 33
## $ Player.Name    (fctr) Aaron Rodgers, Cam Newton, Russell Wilson, Andrew Luck, Carson Palmer, P...
## $ Team           (fctr) GB, CAR, SEA, IND, ARI, DEN, ATL, BUF, NO, OAK, CIN, BAL, SD, SF, PHI, N...
## $ pass_att       (dbl) 34.7, 32.1, 29.6, 34.1, 35.6, 38.8, 37.9, 32.2, 38.8, 38.2, 34.6, 37.8, 3...
## $ pass_att.High  (dbl) 39.6, 33.0, 33.8, 39.0, 38.0, 45.5, 41.2, 36.3, 43.6, 39.5, 40.0, 43.0, 3...
## $ pass_att.Low   (dbl) 32.0, 31.4, 27.0, 22.1, 32.2, 35.0, 35.0, 27.2, 35.0, 37.0, 31.0, 35.0, 3...
## $ pass_cmp       (dbl) 22.8, 20.0, 18.3, 21.2, 23.3, 25.4, 24.4, 19.8, 25.4, 23.7, 21.2, 23.5, 2...
## $ pass_cmp.High  (dbl) 25.3, 21.9, 19.3, 24.0, 26.6, 28.2, 25.8, 22.4, 29.2, 26.3, 25.0, 27.0, 2...
## $ pass_cmp.Low   (dbl) 21.0, 19.1, 17.0, 13.2, 21.4, 23.0, 23.0, 18.3, 23.0, 22.0, 18.4, 21.9, 2...
## $ pass_yds       (dbl) 283.2, 240.2, 228.5, 254.6, 279.5, 295.9, 282.8, 232.7, 290.4, 265.8, 245...
## $ pass_yds.High  (dbl) 317.7, 251.0, 235.2, 298.0, 330.0, 325.5, 306.0, 266.3, 324.1, 275.0, 275...
## $ pass_yds.Low   (dbl) 262.0, 231.0, 220.0, 153.9, 258.0, 273.0, 270.0, 208.4, 260.0, 249.7, 227...
## $ pass_tds       (dbl) 2.4, 1.5, 1.5, 1.9, 2.0, 2.0, 1.8, 1.7, 1.8, 1.8, 1.8, 1.7, 1.9, 1.1, 1.7...
## $ pass_tds.High  (dbl) 3.0, 2.0, 1.8, 2.4, 2.1, 2.4, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 1.2, 2.0...
## $ pass_tds.Low   (dbl) 1.9, 1.3, 1.0, 1.0, 1.9, 1.8, 1.7, 1.5, 1.6, 1.6, 1.5, 1.5, 1.8, 1.0, 1.4...
## $ pass_ints      (dbl) 0.5, 0.8, 0.6, 0.6, 0.9, 0.9, 1.0, 1.0, 1.0, 0.9, 1.0, 0.8, 0.9, 0.7, 0.8...
## $ pass_ints.High (dbl) 0.8, 1.0, 1.0, 1.0, 1.0, 1.0, 1.3, 1.1, 1.3, 1.0, 1.1, 1.0, 1.6, 1.2, 1.0...
## $ pass_ints.Low  (dbl) 0.0, 0.7, 0.0, 0.0, 0.7, 0.7, 0.6, 0.8, 0.7, 0.8, 0.9, 0.7, 0.0, 0.0, 0.6...
## $ rush_att       (dbl) 3.8, 7.8, 7.4, 3.9, 1.9, 1.2, 1.9, 5.2, 1.4, 2.3, 3.5, 2.3, 1.3, 6.9, 1.9...
## $ rush_att.High  (dbl) 5.0, 8.5, 9.0, 5.5, 3.1, 1.7, 2.5, 6.0, 2.8, 3.0, 4.2, 3.7, 2.0, 9.0, 2.4...
## $ rush_att.Low   (dbl) 3.0, 7.0, 6.4, 1.5, 1.0, 0.0, 1.0, 3.5, 0.0, 1.7, 3.0, 1.5, 0.0, 6.1, 1.0...
## $ rush_yds       (dbl) 19.4, 41.6, 38.4, 16.9, 5.5, 0.1, 6.8, 26.6, 3.8, 7.5, 11.9, 6.9, 3.1, 40...
## $ rush_yds.High  (dbl) 24.0, 46.8, 40.8, 23.0, 14.3, 2.3, 8.8, 35.1, 10.4, 10.0, 18.3, 16.9, 4.7...
## $ rush_yds.Low   (dbl) 17.3, 39.2, 33.8, 6.2, 2.6, -1.0, 5.0, 14.3, 0.0, 6.0, 8.0, 3.0, 0.0, 35....
## $ rush_tds       (dbl) 0.1, 0.3, 0.4, 0.1, 0.1, 0.0, 0.0, 0.1, 0.1, 0.1, 0.1, 0.1, 0.0, 0.2, 0.0...
## $ rush_tds.High  (dbl) 0.1, 0.6, 1.0, 0.2, 0.2, 0.0, 0.1, 0.2, 0.1, 0.1, 0.3, 0.2, 0.0, 0.3, 0.1...
## $ rush_tds.Low   (dbl) 0.0, 0.0, 0.2, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0...
## $ fumbles        (dbl) 0.1, 0.2, 0.1, 0.3, 0.2, 0.2, 0.2, 0.2, 0.3, 0.3, 0.2, 0.2, 0.2, 0.3, 0.3...
## $ fumbles.High   (dbl) 0.1, 0.2, 0.1, 0.3, 0.2, 0.2, 0.2, 0.2, 0.3, 0.3, 0.2, 0.3, 0.2, 0.3, 0.3...
## $ fumbles.Low    (dbl) 0.1, 0.2, 0.1, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.3, 0.2, 0.1, 0.2, 0.2, 0.2...
## $ fpts           (dbl) 22.1, 19.8, 19.8, 18.4, 17.9, 17.9, 17.3, 17.1, 17.1, 16.7, 16.6, 16.4, 1...
## $ fpts.High      (dbl) 28.8, 24.0, 25.8, 24.0, 22.0, 22.5, 22.0, 22.8, 22.0, 22.8, 22.8, 22.3, 2...
## $ fpts.Low       (dbl) 19.5, 16.9, 17.7, 9.8, 16.1, 16.8, 16.8, 14.8, 15.2, 13.7, 13.7, 14.5, 15...
## $ X              (lgl) NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...

lather / rinse / repeat for your URLs. 泡沫/冲洗/重复输入您的网址。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM