[英]See information of partitions of a Spark Dataframe
One can have an array of partitions of a Spark DataFrame
as follows: 可以具有一个Spark
DataFrame
的分区数组,如下所示:
> df.rdd.partitions
Is there a way to get more information about partitions? 有没有办法获取有关分区的更多信息? In particular, I would like to see the partition key and the partition boundaries (first and last element within a partition).
特别是,我想查看分区键和分区边界(分区中的第一个和最后一个元素)。
This is just for better understanding of how the data is organized. 这只是为了更好地理解数据的组织方式。
This is what I tried: 这是我尝试的:
> df.partitions.rdd.head
But this object only has attributes and methods equals
hashCode
and index
. 但是此对象仅具有
equals
hashCode
和index
属性和方法。
In case the data is not too large, one can write them to disk as follows: 如果数据不是太大,可以按照以下步骤将它们写入磁盘:
df.write.option("header", "true").csv("/tmp/foobar")
The given directory must not exist. 给定的目录不能存在。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.