简体   繁体   中英

(PY)Spark: How to read a ".txt" file with extension name ".gz"

I need to load a pure txt RDD in spark. But for some reasons, the filename of the file to be loaded must be named as "xxx.gz". This file, by default, is recognized as a gz file when using sc.textFile. How can I tell spark to recognize the file as a pure txt file?

您可以使用gzip

gzip.open(filename, mode='rb', compresslevel=9, encoding=None, errors=None, newline=None)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM