I can read a json file into a dataframe in Pyspark using
spark = SparkSession.builder.appName('GetDetails').getOrCreate()
df = spark.read.json("path to json file")
However, when i try to read a bz2(compressed csv) into a dataframe it gives me an error. I am using:
spark = SparkSession.builder.appName('GetDetails').getOrCreate()
df = spark.read.load("path to bz2 file")
Could you please help correct me?
The method spark.read.load()
has an optional parameter format
which by default is 'parquet'.
So, for your code to work it should look like this:
df = spark.read.load("data.json.bz2", format="json")
Also, spark.read.json
will perfectly work for compressed JSON files, eg:
df = spark.read.json("data.json.bz2")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.