如何将 spark 数据框转换为 SQL 查询？

Question

Now I've got data in spark dataframe, I want to convert back to SQL to do some analysis.现在我在 spark 数据框中获得了数据，我想转换回 SQL 进行一些分析。 Does anyone have any idea how I can do it?有谁知道我该怎么做？ like df.to_sql(...)?像 df.to_sql(...)？

Thanks!谢谢！

Answer 1

您可以使用explain运算符，请参阅此链接。

Answer 2

Try this:试试这个：

df.write.option('header','true').saveAsTable("my_sql_table")

You can then query on my_sql_table using SQL.然后，您可以使用 SQL 查询 my_sql_table。

Answer 3

You can procee DataFrame as SQL using Spark-sql.您可以使用 Spark-sql 将 DataFrame 作为 SQL 处理。

val df = Seq(("Edward", 1, 1000,"me1@example.com"),
                ("Michal",2,15000,"me1@example.com"),
                ("Steve",3,25000,"you@example.com"),
                ("Jordan",4,40000, "me1@example.com")).
      toDF("Name", "ID", "Salary","MailId")
OR
val df = spark.read.json("examples/src/main/resources/employee.json")

// Displays the content of the DataFrame to stdout
df.show()
+------+---+------+---------------+
|  Name| ID|Salary|         MailId|
+------+---+------+---------------+
|Edward|  1|  1000|me1@example.com|
|Michal|  2| 15000|me1@example.com|
| Steve|  3| 25000|you@example.com|
|Jordan|  4| 40000|me1@example.com|
+------+---+------+---------------+

This import is needed to use the $-notation使用 $-notation 需要此导入

import spark.implicits._

// Print the schema in a tree format
df.printSchema()

// Select only the "name" column
df.select("name").show()

// Select employees whose salary > 15000
df.filter($"Salary" > 15000).show()

Even sql function on a SparkSession enables applications to run SQL queries programmatically and returns the result as a DataFrame.甚至 SparkSession 上的 sql 函数也使应用程序能够以编程方式运行 SQL 查询并将结果作为数据帧返回。

// Register the DataFrame as a SQL temporary view
df.createOrReplaceTempView("employee")

    val sqlDF = spark.sql("SELECT * FROM employee")
    sqlDF.show()

+------+---+------+---------------+
    |  Name| ID|Salary|         MailId|
    +------+---+------+---------------+
    |Edward|  1|  1000|me1@example.com|
    |Michal|  2| 15000|me1@example.com|
    | Steve|  3| 25000|you@example.com|
    |Jordan|  4| 40000|me1@example.com|
    +------+---+------+---------------+

Temporary views in Spark SQL are session-scoped and will disappear if the session that creates it terminates. Spark SQL 中的临时视图是会话范围的，如果创建它的会话终止，它就会消失。 If you want to have a temporary view that is shared among all sessions and keep alive until the Spark application terminates, you can create a global temporary view.如果您希望有一个在所有会话之间共享的临时视图并在 Spark 应用程序终止之前保持活动状态，您可以创建一个全局临时视图。

// Register the DataFrame as a global temporary view
df.createGlobalTempView("employee")

// Global temporary view is tied to a system preserved database `global_temp`
spark.sql("SELECT * FROM global_temp.employee").show()

+------+---+------+---------------+
|  Name| ID|Salary|         MailId|
+------+---+------+---------------+
|Edward|  1|  1000|me1@example.com|
|Michal|  2| 15000|me1@example.com|
| Steve|  3| 25000|you@example.com|
|Jordan|  4| 40000|me1@example.com|
+------+---+------+---------------+

Please refer Spark documentation.请参阅 Spark 文档。

https://spark.apache.org/docs/2.3.0/sql-programming-guide.html https://spark.apache.org/docs/2.3.0/sql-programming-guide.html

Hope it helps!希望能帮助到你！

如何将 spark 数据框转换为 SQL 查询？

问题描述

3 个解决方案

解决方案1
1 2019-03-11 18:07:14

解决方案2
0 2021-06-17 11:31:31

解决方案3
-2 2019-03-11 18:17:22

如何将 spark 数据框转换为 SQL 查询？

问题描述

3 个解决方案

解决方案1 1 2019-03-11 18:07:14

解决方案2 0 2021-06-17 11:31:31

解决方案3 -2 2019-03-11 18:17:22

解决方案1
1 2019-03-11 18:07:14

解决方案2
0 2021-06-17 11:31:31

解决方案3
-2 2019-03-11 18:17:22