我对镶木地板文件和 python 完全陌生，谁能告诉我如何读取 pyspark 中带有标题的镶木地板文件

Question

I am completely new to parquet files and python, Can anyone please let me know how to read parquet file with headers in pyspark and how to specify other options like we have it in csv files as below:我对镶木地板文件和 python 完全陌生，谁能告诉我如何读取 pyspark 中带有标题的镶木地板文件，以及如何指定其他选项，例如我们在 csv 文件中的其他选项：

df = spark.read.load("examples/src/main/resources/people.csv",
                     format="csv", sep=":", inferSchema="true", header="true")

Answer 1

Parquet files will have column names in them and We don't need to specify options like header ..etc while reading parquet files. Parquet文件中将包含column names in them我们在读取 parquet 文件时不需要指定header等选项。

To read parquet files:要读取镶木地板文件：

#read parquet file
df=spark.read.parquet("<parquet_file_path>")

#or spark defaultly reads data in parquet format
df=spark.read.load("<parquet_file_path>")

#see data from the dataframe
df.show()

我对镶木地板文件和 python 完全陌生，谁能告诉我如何读取 pyspark 中带有标题的镶木地板文件

问题描述

1 个解决方案

解决方案1
0 2020-08-12 03:15:57

我对镶木地板文件和 python 完全陌生，谁能告诉我如何读取 pyspark 中带有标题的镶木地板文件

问题描述

1 个解决方案

解决方案1 0 2020-08-12 03:15:57

解决方案1
0 2020-08-12 03:15:57