(PY)Spark: How to read a ".txt" file with extension name ".gz"

Question

I need to load a pure txt RDD in spark. But for some reasons, the filename of the file to be loaded must be named as "xxx.gz". This file, by default, is recognized as a gz file when using sc.textFile. How can I tell spark to recognize the file as a pure txt file?

Answer 1

您可以使用gzip 。

gzip.open(filename, mode='rb', compresslevel=9, encoding=None, errors=None, newline=None)

(PY)Spark: How to read a ".txt" file with extension name ".gz"

Question

1 answers

solution1
0 2019-06-24 10:45:14

(PY)Spark: How to read a ".txt" file with extension name ".gz"

Question

1 answers

solution1 0 2019-06-24 10:45:14

solution1
0 2019-06-24 10:45:14