简体   繁体   English

Databricks 中的 PySpark SQL:如何从具有相似名称的表中提取名称?

[英]PySpark SQL in Databricks: How to extract names from tables that have similar names?

I have the following tables:我有以下表格:

Table_A     Table_B    Table_C   List_A

Table_A:表_A:

Column_A             Column_B     Column_C     Column_D
1/1/2020             30           400000       Table_A
2/1/2020             35           200000       Table_A

Table_B:表_B:

Column_A             Column_B     Column_C     Column_D    
1/1/2020             50           4000         Table_B
2/1/2020             70           10000        Table_B

Table_C:表_C:

Column_A             Column_B     Column_C     Column_D
1/1/2020             3            300          Table_C
2/1/2020             5            200          Table_C  

List_A:列表_A:

Column_A             Line_E                    Column_D       
1/1/2020  09:30:00   30                        List_A
2/1/2020  09:31:00   28                        List_A

I want to select all columns from tables that have similar names.我想从具有相似名称的表中选择所有列。 In this example, they are Table_A, Table_B, Table_C .在此示例中,它们是Table_A, Table_B, Table_C And their names begin with "Table_".他们的名字以“Table_”开头。

How to do it?怎么做?

I tried spark.sql("SELECT * FROM * where Column_D like 'Table_*'") , but it does not work.我试过spark.sql("SELECT * FROM * where Column_D like 'Table_*'") ,但它不起作用。

Tables with similar names - are they tables that:具有相似名称的表 - 它们是以下表:

  • all start with Table_ ?都以Table_
  • all start with the same prefix of length 5?都以长度为 5 的相同前缀开头?
  • all start with the same prefix followed by an underscore?都以相同的前缀后跟下划线开头?

Consider table names like Table_A , Tablerone_B , Table_bable_C , Table_D_D - similar or not...?考虑像Table_ATablerone_BTable_bable_CTable_D_D这样的表名 - 相似与否......?

With SQL you can get the list of tables using SHOW TABLES , but it is not possible to run it in a subquery, so you are not able to further process it using SQL.使用 SQL,您可以使用SHOW TABLES获取表列表,但无法在子查询中运行它,因此您无法使用 SQL 进一步处理它。 But you can use Scala or Python.但是你可以使用 Scala 或 Python。

spark.sql("show tables").select("tableName").as[String].collect.groupBy(_.split("_")(0))

This returns Map(list -> Array(list_a), table -> Array(table_a, table_b, table_c))这将返回Map(list -> Array(list_a), table -> Array(table_a, table_b, table_c))

Then you can obtain column names by iterating on above result and running DESCRIBE <table_name> - again you need to think what you want to get (all column names? shared column names?)然后你可以通过迭代上面的结果并运行DESCRIBE <table_name>来获得列名 - 你需要再次考虑你想要得到什么(所有列名?共享列名?)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM