简体   繁体   English

使用Java中的regex提取excel中的所有单元格引用

[英]Extract all cell references in excel with regex in java

have excel data and formulars like: 具有excel数据和公式,例如:

  • (S10+S14+S18+S22+S26+S30+S34) (S10 + S14 + S18 + S22 + S26 + S30 + S34)
  • E10+E11+SUM(E10;E14:E17)*E18-IF(E19<1,E20, E21) E10 + E11 + SUM(E10; E14:E17)* E18-IF(E19 <1,E20,E21)
  • SUM(E14:E19) 总和(E14:E19)
  • S16*S15 S16 * S15

and so on. 等等。

I want to get all the cell references out of the strings. 我想从字符串中获取所有单元格引用。 Like in this Example: "E10+E11+SUM(E10;E14:E17)*E18-IF(E19<1,E20, E21)" 类似于此示例:“ E10 + E11 + SUM(E10; E14:E17)* E18-IF(E19 <1,E20,E21)”

I want the output substring like "E10 E11 ... E21" or separated with ",". 我想要输出子字符串,例如“ E10 E11 ... E21”或以“,”分隔。

I tested a lot with regex but can't get a valid result. 我用正则表达式测试了很多,但无法获得有效的结果。 I am using this code: 我正在使用此代码:

String formulaString = "E10+E11+SUM(E10;E14:E17)*E18-IF(E19<1,E20, E21)";
Pattern pattern = Pattern.compile("REGEX");
Matcher matcher = pattern.matcher(formulaString);

I have tried the following regex: 我已经尝试了以下正则表达式:

http://social.msdn.microsoft.com/Forums/en-US/815e819c-f0f2-4a53-8407-98b0f7f116e2/regex-to-extract-list-of-cell-references-from-excel-formula?forum=csharpgeneral http://social.msdn.microsoft.com/Forums/zh-CN/815e819c-f0f2-4a53-8407-98b0f7f116e2/regex-to-extract-list-of-cell-references-from-excel-formula?forum=通用

REGEX: (\\w+|)?\\$?(?:\\bXF[AD]|X[AE][AZ]|[AW][AZ]{2}|[AZ]{2}|[AZ])\\$?(?:104857[0-6]|10485[0-6]\\d|1048[0-4]\\d{2}|104[0-7]\\d{3}|10[0-3]\\d{4}|[1-9]\\d{1,5}|[1-9])d?\\b(:\\s?\\$?(?:\\bXF[AD]|X[AE][AZ]|[AW][AZ]{2}|[AZ]{2}|[AZ])\\$?(?:104857[0-6]|10485[0-6]\\d|1048[0-4]\\d{2}|104[0-7]\\d{3}|10[0-3]\\d{4}|[1-9]\\d{1,5}|[1-9])d?\\b)? REGEX:(\\ w + |)?\\ $?(?:\\ bXF [AD] | X [AE] [AZ] | [AW] [AZ] {2} | [AZ] {2} | [AZ])\\ $?(?: 104857 [0-6] | 10485 [0-6] \\ d | 1048 [0-4] \\ d {2} | 104 [0-7] \\ d {3} | 10 [0-3 ] \\ d {4} | [1-9] \\ d {1,5} | [1-9])d?\\ b(:\\ s?\\ $?(?:\\ bXF [AD] | X [AE ] [AZ] | [AW] [AZ] {2} | [AZ] {2} | [AZ])\\ $?(?: 104857 [0-6] | 10485 [0-6] \\ d | 1048 [ 0-4] \\ d {2} | 104 [0-7] \\ d {3} | 10 [0-3] \\ d {4} | [1-9] \\ d {1,5} | [1- 9])d?\\ b)?

http://social.msdn.microsoft.com/Forums/en-US/dc179984-4fc8-4346-90e8-1649a23b6afe/regex-solution-to-id-excel-cell-references-in-an-excel-formula-string?forum=regexp http://social.msdn.microsoft.com/Forums/zh-CN/dc179984-4fc8-4346-90e8-1649a23b6afe/regex-solution-to-id-excel-cell-references-in-an-excel-formula- string?forum = regexp

REGEX: \\$?\\b([AZ]|[AH][AZ]|I[AV])\\$?([1-9]\\d{0,3}|[1-5]\\d{4}|6[0-4]\\d{3}|65[0-4]\\d{2}|655[0-2]\\d|6553[0-6])\\b([:\\s]\\$?\\b([AZ]|[AH][AZ]|I[AV])\\$?([1-9]\\d{0,3}|[1-5]\\d{4}|6[0-4]\\d{3}|65[0-4]\\d{2}|655[0-2]\\d|6553[0-6])\\b)? REGEX:\\ $?\\ b([AZ] | [AH] [AZ] | I [AV])\\ $?([1-9] \\ d {0,3} | [1-5] \\ d {4 } | 6 [0-4] \\ d {3} | 65 [0-4] \\ d {2} | 655 [0-2] \\ d | 6553 [0-6])\\ b([:\\ s] \\ $?\\ b([AZ] | [AH] [AZ] | I [AV])\\ $?([1-9] \\ d {0,3} | [1-5] \\ d {4} | 6 [0-4] \\ d {3} | 65 [0-4] \\ d {2} | 655 [0-2] \\ d | 6553 [0-6])\\ b)?

For some of the formular they are working, but not for all. 对于某些配方设计师,他们正在工作,但并非全部。

I hope anyone can help me or give me a tip :) 我希望任何人都可以帮助我或给我小费:)

public static void main(String[]args){
String formula = "E10+E11+SUM(E10;E14:E17)*E18-IF(E19<1,E20, E21)";
String output="";


for(String c: formula.split("[^A-z0-9]+")){
    if(isCell(c)){
               output+=c+" ";
    }
}


}

private static boolean isCell(String current){
   boolean hasLetter = false;
   boolean hasNumber = false;

for(int i=0; i<current.length() && (!hasLetter || !hasNumber); i++){


    if(current.charAt(i)>=65 && current.charAt(i)<=90){


        hasLetter=true;
    }
    else if(current.charAt(i)>='0' && current.charAt(i)<='9'){

        hasNumber=true;
    }
}

return hasLetter && hasNumber;
}

On the first thread you linked to, the regex you put below the link is nowhere to be found on the page. 在您链接到的第一个线程上,您在链接下方放置的正则表达式在页面上找不到。 Were you really using a regex from that page? 您是否真的在使用该页面的正则表达式? The regex that was suggested was: 建议的正则表达式为:

(\w+|)?\$?(?:\bXF[A-D]|X[A-E][A-Z]|[A-W][A-Z]{2}|[A-Z]{2}|[A-Z])\$?(?:104857[0-6]|10485[0-6]\d|1048[0-4]\d{2}|104[0-7]\d{3}|10[0-3]\d{4}|[1-9]\d{1,5}|[1-9])d?\b(:\s?\$?(?:\bXF[A-D]|X[A-E][A-Z]|[A-W][A-Z]{2}|[A-Z]{2}|[A-Z])\$?(?:104857[0-6]|10485[0-6]\d|1048[0-4]\d{2}|104[0-7]\d{3}|10[0-3]\d{4}|[1-9]\d{1,5}|[1-9])d?\b)?

Try that. 试试看

Also, it would help to know what specific strings don't get matched correctly, since you mentioned that some of them work. 同样,这将有助于您了解哪些特定字符串未正确匹配,因为您提到其中某些字符串可以工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM