简体   繁体   中英

apache spark Streaming textFileStream - reading gzip files

I am processing files placed in HDFS using Spark streaming. Specifically using textFileStream method of the JavaStreamingContext class.

As the method name contains 'text' I assumed that this will only read text files, but to my surprise it is also reading gzipped text files.

Can anyone please clarify if this is the expected behavior and what all formats can it read?

Yes, Spark uses Hadoop's File I/O API, which handles compression formats transparently. Even for output, you can configure the compression that should be used through a property setting and the API will handle it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM