简体   繁体   中英

Number of partitions of a spark dataframe?

I have a spark dataframe(Spark 2.3) and it stores a sql which has zero records. When I try to calculate the number of partitions it doesn't showing any results and tried various functions like df.rdd.getNumPartitions / df.rdd.getNumPartitions()/ df.rdd.length / df.rdd.partitions.size.

How to get number of partitions from a spark dataframe having zero or millions records?

code:

empsql = 'Select * From Employee' ## In this sql it has zero records
df = spark.sql(empsql) ##Spark is configured
df.rdd.getNumPartitions

#Using df.rdd.partitions.size got error as: AttributeError: 'RDD' object has no attribute 'partitions'

Try, assuming pyspark:

df.rdd.getNumPartitions()

Simulation via empty DF that should equate to an empty query:

from pyspark.sql.types import *
field = [StructField("FIELDNAME_1",StringType(), True),StructField("FIELDNAME_2", StringType(), True),  StructField("FIELDNAME_3", StringType(), True)]
schema = StructType(field)
df = sqlContext.createDataFrame(sc.emptyRDD(), schema)
df.rdd.getNumPartitions()

returns:

Out[6]: 0

Moreover:

df.registerTempTable('XXX')
yyy = spark.sql("select * from XXX")
yyy.rdd.getNumPartitions()

yields:

Out[11]: 0

Number of partitions for a data frame with zero records depend on how SparkSession object is instantiated.

In-case if I build SparkSession object with config as given below, I will end up getting 4 partitions for a dataframe even though it has ZERO records.

Scala code snippet to prove this point -

val spark = SparkSession.builder()
    .appName(this.getClass.getName)
    .config("spark.master", "local[4]").getOrCreate()

import org.apache.spark.sql.types._

val data = Seq(("first","row"),("second","row"))

val df = spark.createDataFrame(spark.sparkContext.parallelize(data))

val zeroRowDF = df.filter(col("_1") === lit(“third”))

zeroRowDF.count —> it returns ZERO

zeroRowDF.rdd.getNumPartitions —> it returns 4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM