简体   繁体   English

我的Spark sql查询或数据框在哪里执行?

[英]where's my spark sql query or dataframe executing?

I have the following code, it will read table from mysql database: 我有以下代码,它将从mysql数据库读取表:

val jdbcDF = sparkSession.read
  .format("jdbc")
  .option("url", "idbc:mysql location")
  .option("dbtable", "tablename")
  .option("user", "root")
  .option("password", "root")
  .load().where(some condition)

My questions: 我的问题:

  1. While loading ,I am filtering the records based on certain condition. 加载时,我正在根据特定条件过滤记录。 So where condition query will be executed on mysql server and return the result? 那么条件查询将在mysql服务器上执行并返回结果吗?
  2. If I am just loading the table from any database. 如果我只是从任何数据库加载表。 How the my table records will be distrbuted across cluster, who is reponsible doing it? 我的表记录将如何在群集中分配,谁负责?
  1. Unless you will perform some action spark won't execute anything even filter condition and also copying data into memory. 除非您执行一些操作,否则spark将不会执行任何操作,甚至不会执行过滤条件以及将数据复制到内存中。 It's lazy evaluation. 这是懒惰的评估。

  2. The code you have written once action will be performed data will be pulled into memory and then filter will be applied. 一旦执行动作,您编写的代码将数据存入内存,然后应用过滤器。 If you want to execute filter in MySQL then pass the query instead of table name in the dbtable option. 如果要在MySQL中执行过滤器,请在dbtable选项中传递查询而不是表名。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM