简体   繁体   English

使用 R 解压 gz 文件

[英]Decompress gz file using R

I have used ?unzip in the past to get at contents of a zipped file using R. This time around, I am having a hard time extracting the files from a .gz file which can be found here .我过去曾使用?unzip来使用 R 获取压缩文件的内容。这一次,我很难从 .gz 文件中提取文件,该文件可以在这里找到。

I have tried ?gzfile and ?gzcon but have not been able to get it to work.我已经尝试过?gzfile?gzcon但无法让它工作。 Any help you can provide will be greatly appreciated.您可以提供的任何帮助将不胜感激。

Here is a worked example that may help illustrate what gzfile() and gzcon() are for这是一个工作示例,可以帮助说明gzfile()gzcon()的用途

foo <- data.frame(a=LETTERS[1:3], b=rnorm(3))
foo
#  a        b
#1 A 0.586882
#2 B 0.218608
#3 C 1.290776
write.table(foo, file="/tmp/foo.csv")
system("gzip /tmp/foo.csv")             # being very explicit

Now that the file is written, instead of implicit use of file() , use gzfile() :现在文件已写入,而不是隐式使用file() ,使用gzfile()

read.table(gzfile("/tmp/foo.csv.gz"))   
#  a        b
#1 A 0.586882
#2 B 0.218608
#3 C 1.290776

The file you point is a compressed tar archive, and as far as I know, R itself has no interface to tar archives.你指向的文件是一个压缩的 tar 档案,据我所知,R 本身没有 tar 档案的接口。 These are commonly used to distribute source code--as for example for R packages and R sources.这些通常用于分发源代码——例如 R 包和 R 源代码。

To un-gz a file in R you can do要在 R 中取消 gz 文件,您可以执行以下操作

library(R.utils)
gunzip("file.gz", remove=FALSE)

or或者

gunzip("file.gz")

But then you get the default (remove=TRUE) behavior in which the input file is removed after that the output file is fully created and closed.但是随后您将获得默认 (remove=TRUE) 行为,其中在完全创建并关闭输出文件之后删除输入文件。

If you really want to uncompress the file, just use the untar function which does support gzip .如果您真的想解压缩文件,只需使用支持gzipuntar功能。 Eg:例如:

untar('chadwick-0.5.3.tar.gz')

http://blog.revolutionanalytics.com/2009/12/r-tip-save-time-and-space-by-compressing-data-files.html http://blog.revolutionanalytics.com/2009/12/r-tip-save-time-and-space-by-compressing-data-files.html

R added transparent decompression for certain kinds of compressed files in the latest version (2.10). R在最新版本(2.10)中为某些类型的压缩文件添加了透明解压。 If you have your files compressed with bzip2, xvz, or gzip they can be read into R as if they are plain text files.如果您的文件使用 bzip2、xvz 或 gzip 压缩,则它们可以像纯文本文件一样读入 R。 You should have the proper filename extensions.您应该有正确的文件扩展名。

The command...命令...

myData <- read.table('myFile.gz')  

#gzip compressed files have a "gz" extension #gzip 压缩文件具有“gz”扩展名

Will work just as if 'myFile.gz' were the raw text file.就像“myFile.gz”是原始文本文件一样工作。

library(vroom)
columns3 = c('A', 'B',...) ## define column names
Data1<- vroom(".../XXX.tsv",col_names = columns3)

works fine with tsv.gztsv.gz工作正常

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM