简体   繁体   English

使用 readxl 包从 URL 读取 Excel 文件

[英]Read Excel file from a URL using the readxl package

Consider a file on the internet (like this one (note the s in https) https://evs.nci.nih.gov/ftp1/CDISC/SDTM/SDTM%20Terminology.xls考虑互联网上的一个文件(比如这个(注意 https 中的 s) https://evs.nci.nih.gov/ftp1/CDISC/SDTM/SDTM%20Terminology.xls

How can the sheet 2 of the file be read into R?如何将文件的第 2 页读入 R?

The following code is approximation of what is desired (but fails)以下代码是所需的近似值(但失败)

url1<-'https://evs.nci.nih.gov/ftp1/CDISC/SDTM/SDTM%20Terminology.xls'
p1f <- tempfile()
download.file(url1, p1f, mode="wb")
p1<-read_excel(path = p1f, sheet = 2)

This works for me on Windows:这在 Windows 上对我有用:

library(readxl)
library(httr)
packageVersion("readxl")
# [1] ‘0.1.1’

GET(url1, write_disk(tf <- tempfile(fileext = ".xls")))
df <- read_excel(tf, 2L)
str(df)
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 20131 obs. of  8 variables:
# $ Code                        : chr  "C115388" "C115800" "C115801" "C115802" ...
# $ Codelist Code               : chr  NA "C115388" "C115388" "C115388" ...
# $ Codelist Extensible (Yes/No): chr  "No" NA NA NA ...
# $ Codelist Name               : chr  "6 Minute Walk Functional Test Test Code" "6 Minute Walk Functional Test Test Code" "6 Minute Walk Functional Test Test Code" "6 Minute Walk Functional Test Test Code" ...
# $ CDISC Submission Value      : chr  "SIXMW1TC" "SIXMW101" "SIXMW102" "SIXMW103" ...
# $ CDISC Synonym(s)            : chr  "6 Minute Walk Functional Test Test Code" "SIXMW1-Distance at 1 Minute" "SIXMW1-Distance at 2 Minutes" "SIXMW1-Distance at 3 Minutes" ...
# $ CDISC Definition            : chr  "6 Minute Walk Test test code." "6 Minute Walk Test - Distance at 1 minute." "6 Minute Walk Test - Distance at 2 minutes." "6 Minute Walk Test - Distance at 3 minutes." ...
# $ NCI Preferred Term          : chr  "CDISC Functional Test 6MWT Test Code Terminology" "6MWT - Distance at 1 Minute" "6MWT - Distance at 2 Minutes" "6MWT - Distance at 3 Minutes" ...

From this issue on Github (#278):Github 上的这个问题(#278):

some functionality for supporting more general inputs will be pulled out of readr, at which point readxl can exploit that.一些支持更通用输入的功能将从 readr 中提取出来,此时 readxl 可以利用它。

So we should be able to pass urls directly to read_excel() in the (hopefully near) future.所以我们应该能够在(希望不久的)将来将 url 直接传递给read_excel()

use rio R package.使用rio R 包。 link .链接 Here a reprex:这是一个reprex:

library(tidyverse)
library(rio)
url <- 'https://evs.nci.nih.gov/ftp1/CDISC/SDTM/SDTM%20Terminology.xls'
rio::import(file = url,which = 2) %>% 
  glimpse()
#> 
#> Rows: 30,995
#> Columns: 8
#> $ Code                           <chr> "C141663", "C141706", "C141707"...
#> $ `Codelist Code`                <chr> NA, "C141663", "C141663", "C141...
#> $ `Codelist Extensible (Yes/No)` <chr> "No", NA, NA, NA, "No", NA, NA,...
#> $ `Codelist Name`                <chr> "4 Stair Ascend Functional Test...
#> $ `CDISC Submission Value`       <chr> "A4STR1TC", "A4STR101", "A4STR1...
#> $ `CDISC Synonym(s)`             <chr> "4 Stair Ascend Functional Test...
#> $ `CDISC Definition`             <chr> "4 Stair Ascend test code.", "4...
#> $ `NCI Preferred Term`           <chr> "CDISC Functional Test 4 Stair ...

A simpler solution is using the openxlsx package .一个更简单的解决方案是使用openxlsx 包 Here is an example, which can be adapted to your needs:这是一个示例,可以根据您的需要进行调整:

library(openxlsx)
df = read.xlsx("https://archive.ics.uci.edu/ml/machine-learning-databases/00242/ENB2012_data.xlsx",sheet=1)

When I execute the first 3 lines I get three files in a temp folder and the one with no file extension is named filed3a2827f129 .当我执行前 3 filed3a2827f129我在临时文件夹中得到了三个文件,没有文件扩展名的文件被命名为filed3a2827f129 If I add an extension `.xls`` to that file, it can be opened with OpenOffice.org's Calc function and this is the upper right corner of what the viewer panel shows for sheet2.如果我向该文件添加扩展名“.xls”,则可以使用 OpenOffice.org 的 Calc 函数打开它,这是查看器面板为 sheet2 显示的内容的右上角。

在此处输入图片说明

So I wondered if pasting that file path could get read_excel to open it.所以我想知道粘贴那个文件路径是否可以让 read_excel 打开它。 It won't open the original file name but it will open the renamed file:它不会打开原始文件名,但会打开重命名的文件:

> p1<-read_excel( path ="/private/var/folders/yq/m3j1jqtj6hq6s5mq_v0jn3s80000gn/T/RtmpxfaZRt/filed3a2827f129.xls", sheet = 2)
DEFINEDNAME: 21 00 00 01 0b 00 00 00 02 00 00 00 00 00 00 0d 3b 00 00 00 00 a3 4e 00 00 07 00 
DEFINEDNAME: 21 00 00 01 0b 00 00 00 02 00 00 00 00 00 00 0d 3b 00 00 00 00 a3 4e 00 00 07 00 
DEFINEDNAME: 21 00 00 01 0b 00 00 00 02 00 00 00 00 00 00 0d 3b 00 00 00 00 a3 4e 00 00 07 00 
DEFINEDNAME: 21 00 00 01 0b 00 00 00 02 00 00 00 00 00 00 0d 3b 00 00 00 00 a3 4e 00 00 07 00 
> str(p1)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   20131 obs. of  8 variables:
 $ Code                        : chr  "C115388" "C115800" "C115801" "C115802" ...
 $ Codelist Code               : chr  NA "C115388" "C115388" "C115388" ...
 $ Codelist Extensible (Yes/No): chr  "No" NA NA NA ...
 $ Codelist Name               : chr  "6 Minute Walk Functional Test Test Code" "6 Minute Walk Functional Test Test Code" "6 Minute Walk Functional Test Test Code" "6 Minute Walk Functional Test Test Code" ...
 $ CDISC Submission Value      : chr  "SIXMW1TC" "SIXMW101" "SIXMW102" "SIXMW103" ...
 $ CDISC Synonym(s)            : chr  "6 Minute Walk Functional Test Test Code" "SIXMW1-Distance at 1 Minute" "SIXMW1-Distance at 2 Minutes" "SIXMW1-Distance at 3 Minutes" ...
 $ CDISC Definition            : chr  "6 Minute Walk Test test code." "6 Minute Walk Test - Distance at 1 minute." "6 Minute Walk Test - Distance at 2 minutes." "6 Minute Walk Test - Distance at 3 minutes." ...
 $ NCI Preferred Term          : chr  "CDISC Functional Test 6MWT Test Code Terminology" "6MWT - Distance at 1 Minute" "6MWT - Distance at 2 Minutes" "6MWT - Distance at 3 Minutes" ...

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用R中的readxl包从excel文件的某些行开始读取特定列 - Read specific columns starting from certain rows from excel file using readxl package in R 像在R中使用readxl一样读取Excel数据 - Read excel data as is using readxl in R read_excel(来自 readxl 包)将 1899-12-30 之前的日期读取为 NA,如何解决这个问题? - read_excel (from readxl package) reads dates before 1899-12-30 as NA, how to workaround this? 从R中的readxl包中的Excel文件导入特定的工作表,特定的行和特定的列 - Import specific sheets and specific rows and specific columns from Excel file from readxl package in R 使用 purrr 和 readxl 从多个 excel 文件中读取一个工作表并添加字段 - Read one worksheet from multiple excel files using purrr and readxl and add field 使用readxl包范围将所需的单元格或单元格范围从.xlsx文件导入到R data.frame中 - Import the desired cells or cell range from .xlsx file into R data.frame using readxl package range R readxl::read_excel 无法打开 xls 文件 - R readxl::read_excel failed to open xls file R 文件引用在 readxl::read_excel 的上下文中不起作用 function - R file referencing not working in the context of readxl::read_excel function 如何使用“readxl”包从 .xlsx 文件中选择两个特定列? - How to pick two particular columns from .xlsx file using 'readxl' package? 使用URL中的XLConnect包将Excel文件读入R中 - Read Excel file into R with XLConnect package from URL
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM