在SQL语句中使用Java拆分获取表名

Question

In the following code I need to get table names as tokens using SQL reserved words as separators.在下面的代码中，我需要使用 SQL 保留字作为分隔符来获取表名作为标记。 The tokens should contain the table name or the table name followed by a suffix.标记应包含表名或表名后跟后缀。

For example, given例如，给定

table1 t1 inner join table2 t2 outer join table3

The code should return three tokens:该代码应返回三个标记：

Token 1: table1 t1
Token 2: table2 t2 
Token 3: table3

This code instead uses the first reserved word as token, without discarding any other following reserved words:此代码改为使用第一个保留字作为标记，而不丢弃任何其他后续保留字：

    String str = "table1 t1 inner join table2 t2 outer join table3";
    String [] tokens = sql2.split("\\son\\s|\\sinner\\s|\\souter\\s|\\sjoin\\s");
    
    for (int i = 0; i<tokens.length; i++)
         System.out.println("Token "+(i+1)+":"+tokens[i]);

This returns:这将返回：

Token 1:table1 t1
Token 2:join table2 t2 
Token 3:join table3

What is the problem and how to make this work?问题是什么以及如何使它起作用？

Answer 1

This works for the general case for a series of joined tables:这适用于一系列连接表的一般情况：

String[] tableNames = str.split(" (?=inner|outer|left|join|right|cross|full).*?join ");

Answer 2

You can use (?:\w+\s+){1,2}(?=inner|outer join)|(?<=inner join)(?:\s+\w+){1,2}|(?<=outer join)(?:\s+\w+){1,2} as the regex .您可以使用(?:\w+\s+){1,2}(?=inner|outer join)|(?<=inner join)(?:\s+\w+){1,2}|(?<=outer join)(?:\s+\w+){1,2}作为正则表达式。

Demo:演示：

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
    public static void main(String[] args) {
        // Test strings
        String[] arr = { "table1 t1 inner join table2 t2 outer join table3",
                "table1 t1 inner join table2 t2 outer join table3 t3 inner join table2" };
        Pattern pattern = Pattern.compile(
                "(?:\\w+\\s+){1,2}(?=inner|outer join)|(?<=inner join)(?:\\s+\\w+){1,2}|(?<=outer join)(?:\\s+\\w+){1,2}");
        for (String s : arr) {
            System.out.println("Processing: " + s);
            Matcher matcher = pattern.matcher(s);
            while (matcher.find()) {
                System.out.println(matcher.group().trim());
            }
        }
    }
}

Output: Output：

Processing: table1 t1 inner join table2 t2 outer join table3
table1 t1
table2 t2
table3
Processing: table1 t1 inner join table2 t2 outer join table3 t3 inner join table2
table1 t1
table2 t2
table3 t3
table2

Answer 3

I've tested your pattern (unescaped, it's \son\s|\sinner\s|\souter\s|\sjoin\s ) against this test string: table1 t1 inner join table2 t2 outer join table3 on regex101.com and the only match I got is for inner and outer .我已经针对此测试字符串测试了您的模式（未转义，它是\son\s|\sinner\s|\souter\s|\sjoin\s ）： table1 t1 inner join table2 t2 outer join table3 table3 on regex101.com和我得到的唯一匹配项是inner和outer 。 So since you're splitting the string by these tokens, you get your result.因此，由于您通过这些标记拆分字符串，因此您得到了结果。

Perhaps this can help you for your specific case.也许这可以帮助您解决您的具体情况。 I have went for a regex approach, instead of splitting the data.我采用了正则表达式方法，而不是拆分数据。

public class PatternChecker {

    public static void main(String[] args) {
        String str = "table1 t1 inner join table2 t2 outer join table3";
        Pattern p = Pattern.compile("(table[0-9]+( [a-zA-Z0-9]+ )?)");
        Matcher m = p.matcher(str);

        while(m.find()) {
            System.out.println(m.group(0));
        }
    }

}

Later edit稍后编辑

The split pattern \\son\\s|\\sinner\\s|\\souter\\s|\\sjoin\\s did not work because of the mandatory whitespaces used.由于使用了强制空格，拆分模式\\son\\s|\\sinner\\s|\\souter\\s|\\sjoin\\s不起作用。 For instance, you are searching for *on* or *inner* or *outer* or *join* (whitespaces are marked with an asterisk).例如，您正在搜索*on*或*inner*或*outer*或*join* （空格标有星号）。 The whitespaces are part of the keywords you're splitting with.空格是您要拆分的关键字的一部分。 *join* could not be matched since its left-side whitespace was already picked up by the *outer* and *inner* right-side whitespace matches. *join*无法匹配，因为它的左侧空格已被*outer*和*inner*右侧空格匹配选中。

Going back to the split solution, one fix would be to mark the left-side whitespace of join as optional via the ?回到拆分解决方案，一个解决方法是通过? quantifier;量词; this would be the new pattern: \\son\\s|\\sinner\\s|\\souter\\s|\\s?join\\s .这将是新模式： \\son\\s|\\sinner\\s|\\souter\\s|\\s?join\\s 。 This yields some empty tokens that can be filtered out这会产生一些可以过滤掉的空标记

Another idea would be to consider aggregations using join (ie inner join, outer join) as full search criteria, which would lead to \\son\\s|\\sinner join\\s|\\souter join\\s .另一个想法是考虑使用连接（即内部连接、外部连接）作为完整搜索条件的聚合，这将导致\\son\\s|\\sinner join\\s|\\souter join\\s 。 No empty tokens are generated.不会生成空令牌。

public class PatternChecker {

    public static void main(String[] args) {
        String str = "employee t1 inner join department t2 outer join job join table4 history on a=b";

        String[] tokens = str.split("\\son\\s|\\sinner join\\s|\\souter join\\s|\\sjoin\\s");

        for(String token : tokens) {
            System.out.println(token);
        }

        // Output
        // employee t1
        // department t2
        // job
        // table4 history
        // a=b

    }

}

Note that, since you're also including on , you can filter out all the matched tokens containing the equals symbol.请注意，由于您还包括on ，您可以过滤掉所有包含等号的匹配标记。

For a generic fix, you would need to isolate the string contained between from and where and apply the idea above.对于通用修复，您需要隔离from和where之间包含的字符串并应用上述想法。

Answer 4

I found this question that is essentially the same as yours.我发现这个问题与你的问题基本相同。

You can mostly solve this problem by going through a regex nightmare, or you can try to use an external library like Zql which actually parses the SQL statement for you.您通常可以通过正则表达式噩梦来解决这个问题，或者您可以尝试使用像Zql这样的外部库，它实际上会为您解析 SQL 语句。

在SQL语句中使用Java拆分获取表名

问题描述

4 个解决方案

解决方案1
2 已采纳 2020-08-01 05:16:54

解决方案2
1 2020-07-31 22:01:35

解决方案3
0 2020-07-31 21:37:36

解决方案4
0 2020-07-31 21:51:52

在SQL语句中使用Java拆分获取表名

问题描述

4 个解决方案

解决方案1 2 已采纳 2020-08-01 05:16:54

解决方案2 1 2020-07-31 22:01:35

解决方案3 0 2020-07-31 21:37:36

解决方案4 0 2020-07-31 21:51:52

解决方案1
2 已采纳 2020-08-01 05:16:54

解决方案2
1 2020-07-31 22:01:35

解决方案3
0 2020-07-31 21:37:36

解决方案4
0 2020-07-31 21:51:52