简体   繁体   English

R:读取 csv 文件的文件或工作表名称

[英]R: Read file or sheet name of a csv file

Is there a possibility to read out the filename or the sheet name of a.csv file when importing it in R?在 R 中导入 a.csv 文件时,是否可以读出文件名或工作表名称? I generated a.csv by clicking on the url: https://www.populationpyramid.net/api/pp/4/2019/?csv=true我通过点击 url 生成了 a.csv: https://www.populationpyramid.net/api/pp/4/2019/?

The file has the name "Afghanistan-2019" and the sheet name is the same.该文件的名称为“Afghanistan-2019”,工作表名称相同。 Now I tried to do the same with R using现在我尝试使用 R 做同样的事情

library(readr)
df <- read_csv("https://www.populationpyramid.net/api/pp/4/2019/?csv=true")

However, that only gives me access to the data, but I lost the information of the file/sheet name.但是,这只能让我访问数据,但我丢失了文件/工作表名称的信息。 Any suggestions?有什么建议么?

You can use the excel_sheets function from the readxl package to get a character vector of all the sheets contained in the excel file.您可以使用excel_sheets package 中的readxl function 来获取 excel 文件中包含的所有工作表的字符向量。

Edit:编辑:

Sorry, I realized now that you are downloading a CSV file.抱歉,我现在意识到您正在下载 CSV 文件。 CSV files are flat files and as such don't have any sheet names, so your only option is the file name. CSV 文件是平面文件,因此没有任何工作表名称,因此您唯一的选择是文件名。 Since you are essentially querying an API, you could use the httr package instead to send a GET request:由于您实际上是在查询 API,因此您可以使用httr package 来发送GET请求:

library(httr)
library(stringr)

res <- httr::GET("https://www.populationpyramid.net/api/pp/4/2019/?csv=true")

This gives you a response object which contains all kind of interesting information - including both the actual data (duh) and the file name.这会给你一个response object ,其中包含所有有趣的信息 - 包括实际数据(duh)文件名。

You can get the data with the content function:可以获取content为function的数据:

httr::content(res)

#> # A tibble: 21 x 3
#>    Age         M       F
#>    <chr>   <dbl>   <dbl>
#>  1 0-4   2891330 2747452
#>  2 5-9   2765393 2636519
#>  3 10-14 2614937 2501560
#>  4 15-19 2321520 2197654
#>  5 20-24 1950650 1843985
#>  6 25-29 1551332 1433056
#>  7 30-34 1255855 1138037
#>  8 35-39 1033269  954327
#>  9 40-44  834402  758533
#> 10 45-49  649695  603870
#> # … with 11 more rows

To retrieve the file name, we need to get a bit more creative.要检索文件名,我们需要更有创意。 The file name is stored in the content-disposition element in the headers section of the res object:文件名存储在res object 的headers部分的content-disposition元素中:

res$headers$`content-disposition`
#> [1] "attachment; filename=Afghanistan-2019.csv"

We can extract it with a regex which pulls out all the text after the first = :我们可以使用正则表达式提取它,该表达式会提取第一个=之后的所有文本:

stringr::str_extract(res$headers$`content-disposition`, "(?<=\\=).*")

# [1] "Afghanistan-2019.csv"

Since response objects should always contain the same information in the same places (especially when retrieved from the same API), you could easily automate this process.由于response对象应始终在相同的位置包含相同的信息(尤其是从相同的 API 检索时),因此您可以轻松地自动化此过程。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM