Databricks 中的 PySpark SQL：如何从具有相似名称的表中提取名称？

Question

I have the following tables:我有以下表格：

Table_A     Table_B    Table_C   List_A

Table_A:表_A：

Column_A             Column_B     Column_C     Column_D
1/1/2020             30           400000       Table_A
2/1/2020             35           200000       Table_A

Table_B:表_B：

Column_A             Column_B     Column_C     Column_D    
1/1/2020             50           4000         Table_B
2/1/2020             70           10000        Table_B

Table_C:表_C：

Column_A             Column_B     Column_C     Column_D
1/1/2020             3            300          Table_C
2/1/2020             5            200          Table_C

List_A:列表_A：

Column_A             Line_E                    Column_D       
1/1/2020  09:30:00   30                        List_A
2/1/2020  09:31:00   28                        List_A

I want to select all columns from tables that have similar names.我想从具有相似名称的表中选择所有列。 In this example, they are Table_A, Table_B, Table_C .在此示例中，它们是Table_A, Table_B, Table_C 。 And their names begin with "Table_".他们的名字以“Table_”开头。

How to do it?怎么做？

I tried spark.sql("SELECT * FROM * where Column_D like 'Table_*'") , but it does not work.我试过spark.sql("SELECT * FROM * where Column_D like 'Table_*'") ，但它不起作用。

Answer 1

Tables with similar names - are they tables that:具有相似名称的表 - 它们是以下表：

all start with Table_ ?都以Table_ ？
all start with the same prefix of length 5?都以长度为 5 的相同前缀开头？
all start with the same prefix followed by an underscore?都以相同的前缀后跟下划线开头？

Consider table names like Table_A , Tablerone_B , Table_bable_C , Table_D_D - similar or not...?考虑像Table_A ， Tablerone_B ， Table_bable_C ， Table_D_D这样的表名 - 相似与否......？

With SQL you can get the list of tables using SHOW TABLES , but it is not possible to run it in a subquery, so you are not able to further process it using SQL.使用 SQL，您可以使用SHOW TABLES获取表列表，但无法在子查询中运行它，因此您无法使用 SQL 进一步处理它。 But you can use Scala or Python.但是你可以使用 Scala 或 Python。

spark.sql("show tables").select("tableName").as[String].collect.groupBy(_.split("_")(0))

This returns Map(list -> Array(list_a), table -> Array(table_a, table_b, table_c))这将返回Map(list -> Array(list_a), table -> Array(table_a, table_b, table_c))

Then you can obtain column names by iterating on above result and running DESCRIBE <table_name> - again you need to think what you want to get (all column names? shared column names?)然后你可以通过迭代上面的结果并运行DESCRIBE <table_name>来获得列名 - 你需要再次考虑你想要得到什么（所有列名？共享列名？）

Databricks 中的 PySpark SQL：如何从具有相似名称的表中提取名称？

问题描述

1 个解决方案

解决方案1
0 2022-06-12 21:21:56

Databricks 中的 PySpark SQL：如何从具有相似名称的表中提取名称？

问题描述

1 个解决方案

解决方案1 0 2022-06-12 21:21:56

解决方案1
0 2022-06-12 21:21:56