简体   繁体   English

使用正则表达式从SQL文件中提取表

[英]Extract tables from sql file using regex

I want to be able to extract a query's source tables using a regular expression. 我希望能够使用正则表达式提取查询的源表。 This will help me build a script to test dependencies without having to hit the cluster. 这将帮助我构建一个脚本来测试依赖关系,而不必访问集群。

Example: 例:

use datalab2;
select 
   *
from hello as table1
left join (
  select 
     *
  from ${hiveconf:varable_db}.events
  where  test=1
) as table2 on 
  table2.test = table1.test
inner join datalab3.table3 as table3 on 
  table3.test = table1.test;

Would Return: 将返回:

<br>
hello <br>
${hiveconf:varable_db}.events <br>
datalab3.table3 <br>

Solved it using the following two regex expressions 使用以下两个正则表达式来解决它

1: "(with\\s+|,\\s*)([a-zA-Z0-9_]+)\\s+as\\s*\\(" 1:“((\\ s + |,\\ s *)([a-zA-Z0-9 _] +)\\ s + as \\ s * \\(“

2: "(?<=[^a-zA-Z0-9\\_]FROM|[^a-zA-Z0-9\\_]JOIN)(\\s+|\\s+[^\\(])([a-zA-Z0-9\\.\\_\\{\\}\\$\\:]+)" 2:“(?<= [^ a-zA-Z0-9 \\ _] FROM | [^ a-zA-Z0-9 \\ _] JOIN)(\\ s + | \\ s + [^ \\(])([a -ZA-Z0-9 \\ \\ _ \\ {\\} \\ $ \\:] +)”。

The first one gathers a list of all the with statement names 第一个收集所有with语句名称的列表
Example: https://regex101.com/r/sX5hZ2/25 范例: https//regex101.com/r/sX5hZ2/25

The second one gets a list of all the tables used in the query and with statements 第二个获取查询和with语句中使用的所有表的列表
Example: https://regex101.com/r/sX5hZ2/26 范例: https//regex101.com/r/sX5hZ2/26

The distinct items from the second list will be all the tables that the query uses 第二个列表中的不同项将是查询使用的所有表

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM