如何使用 python 在 spark SQL 中传递变量？

Question

I am writing spark code in python.我正在用python编写火花代码。 How do I pass a variable in a spark.sql query?如何在 spark.sql 查询中传递变量？

    q25 = 500
    Q1 = spark.sql("SELECT col1 from table where col2>500 limit $q25 , 1")

Currently the above code does not work?目前上面的代码不起作用？ How do we pass variables?我们如何传递变量？

I have also tried,我也试过，

    Q1 = spark.sql("SELECT col1 from table where col2>500 limit q25='{}' , 1".format(q25))

Answer 1

You need to remove single quote and q25 in string formatting like this:您需要像这样以字符串格式删除单引号和q25 ：

Q1 = spark.sql("SELECT col1 from table where col2>500 limit {}, 1".format(q25))

Update:更新：

Based on your new queries:根据您的新查询：

spark.sql("SELECT col1 from table where col2>500 order by col1 desc limit {}, 1".format(q25))

Note that the SparkSQL does not support OFFSET, so the query cannot work.请注意，SparkSQL 不支持 OFFSET，因此无法进行查询。

If you need add multiple variables you can try this way:如果您需要添加多个变量，您可以尝试这种方式：

q25 = 500
var2 = 50
Q1 = spark.sql("SELECT col1 from table where col2>{0} limit {1}".format(var2,q25))

Answer 2

All you need to do is add s (String interpolator) to the string.您需要做的就是将 s（字符串插值器）添加到字符串中。 This allows the usage of variable directly into the string.这允许将变量直接使用到字符串中。

val q25 = 10
Q1 = spark.sql(s"SELECT col1 from table where col2>500 limit $q25)

Answer 3

Another option if you're doing this sort of thing often or want to make your code easier to re-use is to use a map of configuration variables and the format option:如果您经常做这类事情或想让您的代码更容易重用，另一种选择是使用配置变量映射和格式选项：

configs = {"q25":10,
           "TABLE_NAME":"my_table",
           "SCHEMA":"my_schema"}
Q1 = spark.sql("""SELECT col1 from {SCHEMA}.{TABLE_NAME} 
                  where col2>500 
                  limit {q25}
               """.format(**configs))

Answer 4

A really easy solution is to store the query as a string (using the usual python formatting), and then pass it to the spark.sql() function:一个非常简单的解决方案是将查询存储为字符串（使用通常的 Python 格式），然后将其传递给spark.sql()函数：

q25 = 500

query = "SELECT col1 from table where col2>500 limit {}".format(q25)

Q1 = spark.sql(query)

Answer 5

Using f-Strings approach (PySpark):使用 f-Strings 方法（PySpark）：

table = 'my_schema.my_table'

df = spark.sql(f'select * from {table}')

如何使用 python 在 spark SQL 中传递变量？

问题描述

5 个解决方案

解决方案1
18 2017-06-16 07:07:52

解决方案2
1 2017-07-10 04:54:09

解决方案3
1 2020-05-13 22:33:15

解决方案4
1 2020-11-16 18:46:10

解决方案5
0 2022-07-21 07:29:34

如何使用 python 在 spark SQL 中传递变量？

问题描述

5 个解决方案

解决方案1 18 2017-06-16 07:07:52

解决方案2 1 2017-07-10 04:54:09

解决方案3 1 2020-05-13 22:33:15

解决方案4 1 2020-11-16 18:46:10

解决方案5 0 2022-07-21 07:29:34

解决方案1
18 2017-06-16 07:07:52

解决方案2
1 2017-07-10 04:54:09

解决方案3
1 2020-05-13 22:33:15

解决方案4
1 2020-11-16 18:46:10

解决方案5
0 2022-07-21 07:29:34