在R中自動讀取zip文件

Question

我需要自動化R來讀取一個zip文件中的csv數據文件。

例如，我會輸入：

read.zip(file = "myfile.zip")

在內部，將要做的是：

將myfile.zip解壓縮到臨時文件夾
使用read.csv讀取其中包含的唯一文件

如果zip文件中有多個文件，則會引發錯誤。

我的問題是獲取包含在zip文件中的文件的名稱，在orded中提供它執行read.csv命令。 有誰知道怎么做？

UPDATE

這是我根據@Paul答案寫的函數：

read.zip <- function(zipfile, row.names=NULL, dec=".") {
    # Create a name for the dir where we'll unzip
    zipdir <- tempfile()
    # Create the dir using that name
    dir.create(zipdir)
    # Unzip the file into the dir
    unzip(zipfile, exdir=zipdir)
    # Get the files into the dir
    files <- list.files(zipdir)
    # Throw an error if there's more than one
    if(length(files)>1) stop("More than one data file inside zip")
    # Get the full name of the file
    file <- paste(zipdir, files[1], sep="/")
    # Read the file
    read.csv(file, row.names, dec)
}

由於我將使用tempdir()更多文件，我在其中創建了一個新的目錄，所以我不會對文件感到困惑。 我希望它可能有用！

Answer 1

使用unz另一個解決方案

read.zip <- function(file, ...) {
  zipFileInfo <- unzip(file, list=TRUE)
  if(nrow(zipFileInfo) > 1)
    stop("More than one data file inside zip")
  else
    read.csv(unz(file, as.character(zipFileInfo$Name)), ...)
}

Answer 2

您可以使用unzip來解壓縮文件。 我只是提到這一點，因為你的問題不清楚你是否知道這一點。 關於閱讀文件。 一旦將文件解壓縮到臨時目錄（ ?tempdir ），只需使用list.files查找轉儲到臨時目錄中的文件。 在您的情況下，這只是一個文件，您需要的文件。 使用read.csv讀取它非常簡單：

l = list.files(temp_path)
read.csv(l[1])

假設您的tempdir位置存儲在temp_path 。

Answer 3

我找到了這個帖子，因為我試圖自動從zip中讀取多個csv文件。 我將解決方案改編為更廣泛的案例。 我沒有測試過奇怪的文件名之類的東西，但這對我有用，所以我想我會分享：

read.csv.zip <- function(zipfile, ...) {
# Create a name for the dir where we'll unzip
zipdir <- tempfile()
# Create the dir using that name
dir.create(zipdir)
# Unzip the file into the dir
unzip(zipfile, exdir=zipdir)
# Get a list of csv files in the dir
files <- list.files(zipdir)
files <- files[grep("\\.csv$", files)]
# Create a list of the imported csv files
csv.data <- sapply(files, function(f) {
    fp <- file.path(zipdir, f)
    return(read.csv(fp, ...))
})
return(csv.data)}

Answer 4

如果您的系統上安裝了zcat（Linux，macos和cygwin就是這種情況），您還可以使用：

zipfile<-"test.zip"
myData <- read.delim(pipe(paste("zcat", zipfile)))

此解決方案還具有不創建臨時文件的優點。

Answer 5

這是我使用的一種方法，它主要基於@Corned Beef Hash Map的答案。 以下是我所做的一些更改：

我的方法是使用data.table包的fread() ，它可以很快（一般來說，如果它是壓縮的，尺寸可能很大，所以你可以在這里獲得很大的速度！）。
我還調整了輸出格式，使其成為命名列表，其中列表的每個元素都以文件命名。 對我來說，這是一個非常有用的補充。
我沒有使用正則表達式來篩選list.files抓取的文件，而是使用了list.file()的pattern參數。
最后，我依靠fread()並將pattern作為一個參數，你可以提供類似""或“ NULL或"."東西"." ，您可以使用它來讀取多種類型的數據文件; 實際上，您可以同時讀取多種類型（如果您的.zip包含.csv，.txt，您需要兩者，例如）。 如果只有某些類型的文件，則可以指定模式以僅使用這些文件。

這是實際的功能：

read.csv.zip <- function(zipfile, pattern="\\.csv$", ...){

    # Create a name for the dir where we'll unzip
    zipdir <- tempfile()

    # Create the dir using that name
    dir.create(zipdir)

    # Unzip the file into the dir
    unzip(zipfile, exdir=zipdir)

    # Get a list of csv files in the dir
    files <- list.files(zipdir, rec=TRUE, pattern=pattern)

    # Create a list of the imported csv files
    csv.data <- sapply(files, 
        function(f){
            fp <- file.path(zipdir, f)
            dat <- fread(fp, ...)
            return(dat)
        }
    )

    # Use csv names to name list elements
    names(csv.data) <- basename(files)

    # Return data
    return(csv.data)
}

Answer 6

以下改進了上述答案。 FUN可以是read.csv，cat或任何你喜歡的東西，只要第一個參數接受文件路徑。 例如

head(read.zip.url("http://www.cms.gov/Medicare/Coding/ICD9ProviderDiagnosticCodes/Downloads/ICD-9-CM-v32-master-descriptions.zip", filename = "CMS32_DESC_LONG_DX.txt"))

read.zip.url <- function(url, filename = NULL, FUN = readLines, ...) {
  zipfile <- tempfile()
  download.file(url = url, destfile = zipfile, quiet = TRUE)
  zipdir <- tempfile()
  dir.create(zipdir)
  unzip(zipfile, exdir = zipdir) # files="" so extract all
  files <- list.files(zipdir)
  if (is.null(filename)) {
    if (length(files) == 1) {
      filename <- files
    } else {
      stop("multiple files in zip, but no filename specified: ", paste(files, collapse = ", "))
    }
  } else { # filename specified
    stopifnot(length(filename) ==1)
    stopifnot(filename %in% files)
  }
  file <- paste(zipdir, files[1], sep="/")
  do.call(FUN, args = c(list(file.path(zipdir, filename)), list(...)))
}

Answer 7

另一種使用來自data.table包的fread的方法

fread.zip <- function(zipfile, ...) {
  # Function reads data from a zipped csv file
  # Uses fread from the data.table package

  ## Create the temporary directory or flush CSVs if it exists already
  if (!file.exists(tempdir())) {dir.create(tempdir())
  } else {file.remove(list.files(tempdir(), full = T, pattern = "*.csv"))
  }

  ## Unzip the file into the dir
  unzip(zipfile, exdir=tempdir())

  ## Get path to file
  file <- list.files(tempdir(), pattern = "*.csv", full.names = T)

  ## Throw an error if there's more than one
  if(length(file)>1) stop("More than one data file inside zip")

  ## Read the file
  fread(file, 
     na.strings = c(""), # read empty strings as NA
     ...
  )
}

根據@joão-daniel的回答/更新

Answer 8

我剛剛編寫了一個基於top read.zip的函數，可能會有所幫助......

read.zip <- function(zipfile, internalfile=NA, read.function=read.delim, verbose=TRUE, ...) {
    # function based on http://stackoverflow.com/questions/8986818/automate-zip-file-reading-in-r

    # check the files within zip
    unzfiles <- unzip(zipfile, list=TRUE)
    if (is.na(internalfile) || is.numeric(internalfile)) {
        internalfile <- unzfiles$Name[ifelse(is.na(internalfile),1,internalfile[1])]
    }
    # Create a name for the dir where we'll unzip
    zipdir <- tempfile()
    # Create the dir using that name
    if (verbose) catf("Directory created:",zipdir,"\n")
    dir.create(zipdir)
    # Unzip the file into the dir
    if (verbose) catf("Unzipping file:",internalfile,"...")
    unzip(zipfile, file=internalfile, exdir=zipdir)
    if (verbose) catf("Done!\n")
    # Get the full name of the file
    file <- paste(zipdir, internalfile, sep="/")
    if (verbose) 
        on.exit({ 
            catf("Done!\nRemoving temporal files:",file,".\n") 
            file.remove(file)
            file.remove(zipdir)
            }) 
    else
        on.exit({file.remove(file); file.remove(zipdir);})
    # Read the file
    if (verbose) catf("Reading File...")
    read.function(file, ...)
}

Answer 9

解壓縮文件位置

outDir<-"~/Documents/unzipFolder"

獲取所有的zip文件

zipF <- list.files(path = "~/Documents/", pattern = "*.zip", full.names = TRUE)

解壓縮所有文件

purrr::map(.x = zipF, .f = unzip, exdir = outDir)

在R中自動讀取zip文件

問題描述

9 個解決方案

解決方案1
11 2012-01-24 14:41:42

解決方案2
10 已采納 2012-01-24 12:49:25

解決方案3
4 2013-09-06 20:20:26

解決方案4
2 2013-04-10 07:52:20

解決方案5
2 2015-08-04 22:41:14

解決方案6
1 2014-07-05 12:32:25

解決方案7
1 2015-11-20 11:16:26

解決方案8
0 2015-06-25 17:09:34

解決方案9
0 2019-06-18 03:26:50

解壓縮文件位置

獲取所有的zip文件

解壓縮所有文件

在R中自動讀取zip文件

問題描述

9 個解決方案

解決方案1 11 2012-01-24 14:41:42

解決方案2 10 已采納 2012-01-24 12:49:25

解決方案3 4 2013-09-06 20:20:26

解決方案4 2 2013-04-10 07:52:20

解決方案5 2 2015-08-04 22:41:14

解決方案6 1 2014-07-05 12:32:25

解決方案7 1 2015-11-20 11:16:26

解決方案8 0 2015-06-25 17:09:34

解決方案9 0 2019-06-18 03:26:50

解壓縮文件位置

獲取所有的zip文件

解壓縮所有文件

解決方案1
11 2012-01-24 14:41:42

解決方案2
10 已采納 2012-01-24 12:49:25

解決方案3
4 2013-09-06 20:20:26

解決方案4
2 2013-04-10 07:52:20

解決方案5
2 2015-08-04 22:41:14

解決方案6
1 2014-07-05 12:32:25

解決方案7
1 2015-11-20 11:16:26

解決方案8
0 2015-06-25 17:09:34

解決方案9
0 2019-06-18 03:26:50