简体   繁体   English

java中字符串的正则表达式

[英]Regular Expression for string in java

I am trying to write a regular expression for these find of strings 我正在尝试为这些字符串查找编写正则表达式

05 IMA-POLICY-ID         PIC X(15).               00020068

05 (AMENT)-GROUPCD       PIC X(10).

I want to parse anything between 05 and first tab . 我想解析05和第一个标签之间的任何内容。 The line might start with tabs or spaces and then digit Initial number can be anything 05,10,15 . 该行可能以制表符或空格开头,然后数字初始数字可以是任何05,10,15。

So In the first line I need to pasrse IMA-POLICY-ID and in second line (AMENT)-GROUPCD 所以在第一行我需要传递IMA-POLICY-ID和第二行(AMENT)-GROUPCD

This is the code i have written and its not finding the pattern where am i going wrong ? 这是我写的代码,它没有找到我错误的模式?

Pattern p1 = Pattern.compile("^[0-9]+\\s\\S+\t$"); 
Matcher m1 = p1.matcher(line); 
System.out.println("m1 =="+m1.group());
Pattern p1 = Pattern.compile("\\b(?:05|1[05])\\b[^\\t]*\\t"); 

will match anything from 05 , 10 or 15 until the nearest \\t . 将匹配任何从051015 ,直到最近的\\t

Explanation: 说明:

\b           # Start of number/word
(?:05|1[05]) # Match 05, 10 or 15
\b           # End of number/word
[^\t]*       # Match any number of characters except tab
\t           # Match a tab

Your pattern expects the line to end after IMA-POLICY-ID etc, because of the $ at the end. 您的模式期望该行在IMA-POLICY-ID等之后结束,因为最后是$

If there is no white space in the string you want to match (I assume there isn't because of your use of \\S+ , I'd change the pattern to ^\\d+\\s+(\\S+) which should be sufficient to match any number at the start of a line, followed by whitespace and then the group of non-whitespace characters you want to match (note that a tab is whitespace as well). 如果你想要匹配的字符串中没有空格(我假设没有因为你使用\\S+ ,我会将模式更改为^\\d+\\s+(\\S+) ,这应该足以匹配一行开头的任何数字,后跟空格,然后是你想要匹配的非空白字符组(请注意,标签也是空格)。

If you need to match until the first tab or the end of the input and include other whitespace, replace (\\S+) with ([^\\t]+) . 如果需要匹配到第一个选项卡或输入的结尾并包含其他空格,请用([^\\t]+)替换(\\S+) ([^\\t]+)

^\d+\s+([^\s]+)

this will match your requirement 这符合您的要求

demo here : http://regex101.com/r/rQ7fT3 这里演示: http//regex101.com/r/rQ7fT3

I can see two things that might prevent your Pattern from working. 我可以看到两件可能阻止你的Pattern工作的东西。

  1. Firstly your input Strings contain multiple tab-separated values, therefore the $ "end-of-input" character at the end of your Pattern will fail to match the String 首先你输入Strings包含多个制表符分隔值,因此$ “结束输入”字在你的最终Pattern将无法匹配String
  2. Secondly, you want to find what's in between 05 (etc.) and the 1st tab. 其次,你想找到05 (等)和第一个标签之间的内容。 Therefore you need to wrap your desired expression between parenthesis (eg (\\\\S+) ) and refer it by its group number (in this case, it would be group 1 ) 因此,您需要在括号之间包含所需的表达式(例如(\\\\S+) )并通过其组号引用它(在这种情况下,它将是组1

Here's an example: 这是一个例子:

String input = "05 IMA-POLICY-ID\tPIC X(15).\t00020068" +
                "\r\n05 (AMENT)-GROUPCD\tPIC X(10).";
//                           | 0, 1, or 5 twice (refine here if needed)
//                           |       | 1 whitespace
//                           |       |  | your queried expression (here I use a 
//                           |       |  | reluctant dot search
//                           |       |  |    | tab
//                           |       |  |    |  | anything after, reluctant
Pattern p = Pattern.compile("[015]{2}\\s(.+?)\t.+?");
Matcher m = p.matcher(input);
while (m.find()) {
    System.out.println("Found: " + m.group(1));
}

Output 产量

Found: IMA-POLICY-ID
Found: (AMENT)-GROUPCD

Your regex is almost correct. 你的正则表达式几乎是正确的。 Just remove the \\t$ at the end of your regex. 只需删除正则表达式末尾的\\t$ and capture the \\\\S+ as a group. 并将\\\\S+作为一组捕获。

Pattern p1 = Pattern.compile("^[0-9]+\\s(\\S+)");

Now print it as: 现在将其打印为:

if (m.find( )) {
    System.out.println(m.group(1));
}

This is what i came up with and it worked : 这就是我提出的并且它有效:

String re = "^\\s+\\d+\\s+([^\\s]+)";
Pattern p1 = Pattern.compile(re, Pattern.MULTILINE); 
Matcher m1 = p1.matcher(line);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM