[英]Read Excel file from a URL using the readxl package
Consider a file on the internet (like this one (note the s in https) https://evs.nci.nih.gov/ftp1/CDISC/SDTM/SDTM%20Terminology.xls考虑互联网上的一个文件(比如这个(注意 https 中的 s) https://evs.nci.nih.gov/ftp1/CDISC/SDTM/SDTM%20Terminology.xls
How can the sheet 2 of the file be read into R?如何将文件的第 2 页读入 R?
The following code is approximation of what is desired (but fails)以下代码是所需的近似值(但失败)
url1<-'https://evs.nci.nih.gov/ftp1/CDISC/SDTM/SDTM%20Terminology.xls'
p1f <- tempfile()
download.file(url1, p1f, mode="wb")
p1<-read_excel(path = p1f, sheet = 2)
This works for me on Windows:这在 Windows 上对我有用:
library(readxl)
library(httr)
packageVersion("readxl")
# [1] ‘0.1.1’
GET(url1, write_disk(tf <- tempfile(fileext = ".xls")))
df <- read_excel(tf, 2L)
str(df)
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 20131 obs. of 8 variables:
# $ Code : chr "C115388" "C115800" "C115801" "C115802" ...
# $ Codelist Code : chr NA "C115388" "C115388" "C115388" ...
# $ Codelist Extensible (Yes/No): chr "No" NA NA NA ...
# $ Codelist Name : chr "6 Minute Walk Functional Test Test Code" "6 Minute Walk Functional Test Test Code" "6 Minute Walk Functional Test Test Code" "6 Minute Walk Functional Test Test Code" ...
# $ CDISC Submission Value : chr "SIXMW1TC" "SIXMW101" "SIXMW102" "SIXMW103" ...
# $ CDISC Synonym(s) : chr "6 Minute Walk Functional Test Test Code" "SIXMW1-Distance at 1 Minute" "SIXMW1-Distance at 2 Minutes" "SIXMW1-Distance at 3 Minutes" ...
# $ CDISC Definition : chr "6 Minute Walk Test test code." "6 Minute Walk Test - Distance at 1 minute." "6 Minute Walk Test - Distance at 2 minutes." "6 Minute Walk Test - Distance at 3 minutes." ...
# $ NCI Preferred Term : chr "CDISC Functional Test 6MWT Test Code Terminology" "6MWT - Distance at 1 Minute" "6MWT - Distance at 2 Minutes" "6MWT - Distance at 3 Minutes" ...
From this issue on Github (#278):从Github 上的这个问题(#278):
some functionality for supporting more general inputs will be pulled out of readr, at which point readxl can exploit that.一些支持更通用输入的功能将从 readr 中提取出来,此时 readxl 可以利用它。
So we should be able to pass urls directly to read_excel()
in the (hopefully near) future.所以我们应该能够在(希望不久的)将来将 url 直接传递给read_excel()
。
use rio
R package.使用rio
R 包。 link .链接。 Here a reprex:这是一个reprex:
library(tidyverse)
library(rio)
url <- 'https://evs.nci.nih.gov/ftp1/CDISC/SDTM/SDTM%20Terminology.xls'
rio::import(file = url,which = 2) %>%
glimpse()
#>
#> Rows: 30,995
#> Columns: 8
#> $ Code <chr> "C141663", "C141706", "C141707"...
#> $ `Codelist Code` <chr> NA, "C141663", "C141663", "C141...
#> $ `Codelist Extensible (Yes/No)` <chr> "No", NA, NA, NA, "No", NA, NA,...
#> $ `Codelist Name` <chr> "4 Stair Ascend Functional Test...
#> $ `CDISC Submission Value` <chr> "A4STR1TC", "A4STR101", "A4STR1...
#> $ `CDISC Synonym(s)` <chr> "4 Stair Ascend Functional Test...
#> $ `CDISC Definition` <chr> "4 Stair Ascend test code.", "4...
#> $ `NCI Preferred Term` <chr> "CDISC Functional Test 4 Stair ...
A simpler solution is using the openxlsx package .一个更简单的解决方案是使用openxlsx 包。 Here is an example, which can be adapted to your needs:这是一个示例,可以根据您的需要进行调整:
library(openxlsx)
df = read.xlsx("https://archive.ics.uci.edu/ml/machine-learning-databases/00242/ENB2012_data.xlsx",sheet=1)
When I execute the first 3 lines I get three files in a temp folder and the one with no file extension is named filed3a2827f129
.当我执行前 3 filed3a2827f129
我在临时文件夹中得到了三个文件,没有文件扩展名的文件被命名为filed3a2827f129
。 If I add an extension `.xls`` to that file, it can be opened with OpenOffice.org's Calc function and this is the upper right corner of what the viewer panel shows for sheet2.如果我向该文件添加扩展名“.xls”,则可以使用 OpenOffice.org 的 Calc 函数打开它,这是查看器面板为 sheet2 显示的内容的右上角。
So I wondered if pasting that file path could get read_excel to open it.所以我想知道粘贴那个文件路径是否可以让 read_excel 打开它。 It won't open the original file name but it will open the renamed file:它不会打开原始文件名,但会打开重命名的文件:
> p1<-read_excel( path ="/private/var/folders/yq/m3j1jqtj6hq6s5mq_v0jn3s80000gn/T/RtmpxfaZRt/filed3a2827f129.xls", sheet = 2)
DEFINEDNAME: 21 00 00 01 0b 00 00 00 02 00 00 00 00 00 00 0d 3b 00 00 00 00 a3 4e 00 00 07 00
DEFINEDNAME: 21 00 00 01 0b 00 00 00 02 00 00 00 00 00 00 0d 3b 00 00 00 00 a3 4e 00 00 07 00
DEFINEDNAME: 21 00 00 01 0b 00 00 00 02 00 00 00 00 00 00 0d 3b 00 00 00 00 a3 4e 00 00 07 00
DEFINEDNAME: 21 00 00 01 0b 00 00 00 02 00 00 00 00 00 00 0d 3b 00 00 00 00 a3 4e 00 00 07 00
> str(p1)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 20131 obs. of 8 variables:
$ Code : chr "C115388" "C115800" "C115801" "C115802" ...
$ Codelist Code : chr NA "C115388" "C115388" "C115388" ...
$ Codelist Extensible (Yes/No): chr "No" NA NA NA ...
$ Codelist Name : chr "6 Minute Walk Functional Test Test Code" "6 Minute Walk Functional Test Test Code" "6 Minute Walk Functional Test Test Code" "6 Minute Walk Functional Test Test Code" ...
$ CDISC Submission Value : chr "SIXMW1TC" "SIXMW101" "SIXMW102" "SIXMW103" ...
$ CDISC Synonym(s) : chr "6 Minute Walk Functional Test Test Code" "SIXMW1-Distance at 1 Minute" "SIXMW1-Distance at 2 Minutes" "SIXMW1-Distance at 3 Minutes" ...
$ CDISC Definition : chr "6 Minute Walk Test test code." "6 Minute Walk Test - Distance at 1 minute." "6 Minute Walk Test - Distance at 2 minutes." "6 Minute Walk Test - Distance at 3 minutes." ...
$ NCI Preferred Term : chr "CDISC Functional Test 6MWT Test Code Terminology" "6MWT - Distance at 1 Minute" "6MWT - Distance at 2 Minutes" "6MWT - Distance at 3 Minutes" ...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.