简体   繁体   中英

R: Read file or sheet name of a csv file

Is there a possibility to read out the filename or the sheet name of a.csv file when importing it in R? I generated a.csv by clicking on the url: https://www.populationpyramid.net/api/pp/4/2019/?csv=true

The file has the name "Afghanistan-2019" and the sheet name is the same. Now I tried to do the same with R using

library(readr)
df <- read_csv("https://www.populationpyramid.net/api/pp/4/2019/?csv=true")

However, that only gives me access to the data, but I lost the information of the file/sheet name. Any suggestions?

You can use the excel_sheets function from the readxl package to get a character vector of all the sheets contained in the excel file.

Edit:

Sorry, I realized now that you are downloading a CSV file. CSV files are flat files and as such don't have any sheet names, so your only option is the file name. Since you are essentially querying an API, you could use the httr package instead to send a GET request:

library(httr)
library(stringr)

res <- httr::GET("https://www.populationpyramid.net/api/pp/4/2019/?csv=true")

This gives you a response object which contains all kind of interesting information - including both the actual data (duh) and the file name.

You can get the data with the content function:

httr::content(res)

#> # A tibble: 21 x 3
#>    Age         M       F
#>    <chr>   <dbl>   <dbl>
#>  1 0-4   2891330 2747452
#>  2 5-9   2765393 2636519
#>  3 10-14 2614937 2501560
#>  4 15-19 2321520 2197654
#>  5 20-24 1950650 1843985
#>  6 25-29 1551332 1433056
#>  7 30-34 1255855 1138037
#>  8 35-39 1033269  954327
#>  9 40-44  834402  758533
#> 10 45-49  649695  603870
#> # … with 11 more rows

To retrieve the file name, we need to get a bit more creative. The file name is stored in the content-disposition element in the headers section of the res object:

res$headers$`content-disposition`
#> [1] "attachment; filename=Afghanistan-2019.csv"

We can extract it with a regex which pulls out all the text after the first = :

stringr::str_extract(res$headers$`content-disposition`, "(?<=\\=).*")

# [1] "Afghanistan-2019.csv"

Since response objects should always contain the same information in the same places (especially when retrieved from the same API), you could easily automate this process.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM