[英]I am completely new to parquet files and python, Can anyone please let me know how to read parquet file with headers in pyspark
I am completely new to parquet files and python, Can anyone please let me know how to read parquet file with headers in pyspark and how to specify other options like we have it in csv files as below:我对镶木地板文件和 python 完全陌生,谁能告诉我如何读取 pyspark 中带有标题的镶木地板文件,以及如何指定其他选项,例如我们在 csv 文件中的其他选项:
df = spark.read.load("examples/src/main/resources/people.csv",
format="csv", sep=":", inferSchema="true", header="true")
Parquet files will have column names in them
and We don't need to specify options like header
..etc while reading parquet files. Parquet文件中将包含
column names in them
我们在读取 parquet 文件时不需要指定header
等选项。
To read parquet files:要读取镶木地板文件:
#read parquet file
df=spark.read.parquet("<parquet_file_path>")
#or spark defaultly reads data in parquet format
df=spark.read.load("<parquet_file_path>")
#see data from the dataframe
df.show()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.