简体   繁体   中英

where's my spark sql query or dataframe executing?

I have the following code, it will read table from mysql database:

val jdbcDF = sparkSession.read
  .format("jdbc")
  .option("url", "idbc:mysql location")
  .option("dbtable", "tablename")
  .option("user", "root")
  .option("password", "root")
  .load().where(some condition)

My questions:

  1. While loading ,I am filtering the records based on certain condition. So where condition query will be executed on mysql server and return the result?
  2. If I am just loading the table from any database. How the my table records will be distrbuted across cluster, who is reponsible doing it?
  1. Unless you will perform some action spark won't execute anything even filter condition and also copying data into memory. It's lazy evaluation.

  2. The code you have written once action will be performed data will be pulled into memory and then filter will be applied. If you want to execute filter in MySQL then pass the query instead of table name in the dbtable option.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM