[英]How does schema inference work in spark.read.parquet?
I'm trying to read a parquet file on spark and I have a question.我正在尝试阅读 spark 上的镶木地板文件,但我有一个问题。
How is the type inferred when loading a parquet file with spark.read.parquet?使用 spark.read.parquet 加载 parquet 文件时如何推断类型?
Is there a dictionary for mapping like 1 ?有没有像1这样的映射字典? Or is it inferred from the actual stored values like 2 ?
或者它是从实际存储的值(如2 )推断出来的?
Spark uses the parquet schema to parse it to an internal representation (ie, StructType), it is a bit hard to find this information on spark docs. Spark 使用 parquet 模式将其解析为内部表示(即 StructType),在 spark 文档上很难找到此信息。 I went through the code to find the mapping you are looking for here:
我浏览了代码以在此处找到您要查找的映射:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L197-L281 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L197-L281
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.