i'm trying to read my gziped csv file from S3
Given that I have a list of my data already like
> MyKeys
[1] "2020/07/25/21/0001_part_00.gz" "2020/07/25/22/0000_part_00.gz" "2020/07/25/22/0001_part_00.gz" "2020/07/25/23/0000_part_00.gz" "2020/07/25/23/0001_part_00.gz"
using
x<-get_object(MyKeys[1], bucket = bucket)
it returns
str(x)
raw [1:42017043] 1f 8b 08 00 ...
i tryied to use
rawToChar(x)
gunzip(x, remove=FALSE)
read.table(rawConnection(get_object(MyKeys[1], bucket = bucket)))
read_delim(gzfile(get_object(touse[1], bucket = bucket)), ",", escape_double = FALSE, trim_ws = TRUE)
and a few more tricks that i dont remember.
and none of this worked.. i'm lost here.
well, after all I managed to find a solution.
df <- get_object(key, bucket = bucket) %>%
rawConnection %>%
gzcon %>%
read_delim( "|", escape_double = FALSE, trim_ws = TRUE, col_names = FALSE)
explaining a bit for anyone who finds himself in this kind of trouble
the method Get_object is the main S3 method. With rawConnection you can stream the gzcon which is the way to read and descompress a Gzip File (some sort of bitstream I dont know why it is this way...) finaly read_delim which is no mistery for anyone. and it is legen... wait for it... there is a trick here. when using RawConnection R allocates internally a vector for your file. and it STAYS there until you close it. usually you create one object and then close it like
x<- rawConnection(<args>)
close(x)
but in this case its created on the fly using magrittr's '%>%' so i dont have the reference.
if you are doing the same as I am, and you are reading months of data in thousands of files in a loop you will recive the error message
All the connections are in use
worry not.. Rawconnection store 128 files...tops.. so if you store into a local file or variable and use the "garbage collector method" closeAllConnections() and it wipes all stored files as rawconnections
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.