简体   繁体   English

如何使用正则表达式正确拆分/解析此字符串

[英]How do I split/parse this String properly using Regex

I am inexperienced with regex and rusty with JAVA, so some help here would be appreciated. 我对regex不熟悉,而对JAVA则不熟悉,因此这里的一些帮助将不胜感激。

So I have a String in the form: 所以我有一个字符串形式:

statement|digit|statement

statement|digit|statement

etc. 等等

where statement can be any combination of characters, digits, and spaces. where语句可以是字符,数字和空格的任意组合。 I want to parse this string such that I save the first and last statements of each line in a separate string array. 我想解析此字符串,以便将每行的第一条和最后一条语句保存在单独的字符串数组中。

for example if I had a string: 例如,如果我有一个字符串:

cats|1|short hair and long hair

cats|2|black, blue

dogs|1|cats are better than dogs

I want to be able to parse the string into two arrays. 我希望能够将字符串解析为两个数组。

Array one = [cats], [cats], [dogs]

Array two = [short hair and long hair],[black, blue],[cats are better than dogs]

    Matcher m = Pattern.compile("(\\.+)|\\d+|=(\\.+)").matcher(str);

        while(m.find()) {
          String key = m.group(1);
          String value = m.group(2);
          System.out.printf("key=%s, value=%s\n", key, value);
        }

I would have continued to add the keys and values into seperate arrays had my output been right but no luck. 如果我的输出正确但没有运气,我会继续将键和值添加到单独的数组中。 Any help with this would be very much appreciated. 任何帮助,将不胜感激。

Here is a solution with RegEx: 这是RegEx的解决方案:

public class ParseString {
    public static void main(String[] args) {
        String data = "cats|1|short hair and long hair\n"+
                      "cats|2|black, blue\n"+
                      "dogs|1|cats are better than dogs";
        List<String> result1 = new ArrayList<>();
        List<String> result2 = new ArrayList<>();
        Pattern pattern = Pattern.compile("(.+)\\|\\d+\\|(.+)");

        Matcher m = pattern.matcher(data);
        while (m.find()) {
           String key = m.group(1);
           String value = m.group(2);
           result1.add(key);
           result2.add(value);
           System.out.printf("key=%s, value=%s\n", key, value);
        }
    }
}

Here is a great site to help with regex http://txt2re.com/ expressions. 这是一个帮助您使用正则表达式http://txt2re.com/表达式的好网站。 Enter some example text in step one. 在第一步中输入一些示例文本。 Select the parts you are interested in part 2. And select a language in step 3. Then copy, paste and massage the code that it spits out. 选择您在第2部分中感兴趣的部分。然后在第3步中选择一种语言。然后复制,粘贴和按摩它吐出的代码。

Double split should work: 双重分割应该起作用:

class ParseString
{  
  public static void main(String[] args)
  {  
    String s = "cats|1|short hair and long hair\ncats|2|black, blue\ndogs|1|cats are better than dogs";
    String[] sa1 = s.split("\n");
    for (int i = 0; i < sa1.length; i++)
    {  
      String[] sa2 = sa1[i].split("\\|");
      System.out.printf("key=%s, value=%s\n", sa2[0], sa2[2]);
    } // end for i
  } // end main
} // end class ParseString

Output: 输出:

key=cats, value=short hair and long hair
key=cats, value=black, blue
key=dogs, value=cats are better than dogs

The main problem is that you need to escape | 主要的问题是,你需要逃脱| and not the . 而不是. . Also what is the = doing in your regex? 还有=在正则表达式中做什么? I generalized the regex a little bit but you can replace .* by \\\\d+ to have the same as you. 我稍微推广了正则表达式,但是您可以用\\\\d+替换.*以使其与您的相同。

Matcher m = Pattern.compile("^(.+?)\\|.*\\|(.+)$", Pattern.MULTILINE).matcher(str);

Here is the strict version: "^([^|]+)\\\\|\\\\d+\\\\|([^|]+)$" (also with MULTILINE) 这是严格的版本: "^([^|]+)\\\\|\\\\d+\\\\|([^|]+)$" (也使用MULTILINE)

And it's indeed easier using split (on the lines) as some have said, but like this: 正如某些人所说,使用split (在线)确实更容易,但是像这样:

String[] parts = str.split("\\|\\d+\\|");

If parts.length is not two then you know it is not a legal line. 如果parts.length不为2,那么您知道这不是合法行。

If your input is always formatted like that, then you can just do with this single statement to get the left part in the even indexes and the right part in the odd indexes (0: line1-left , 1: line1-right , 2: line2-left , 3: line2-right , 4: line3-left ...), so you will get an array twice the size of line count. 如果您的输入始终采用这种格式,那么您只需使用这条语句即可获得偶数索引左部分奇数索引右部分 (0: line1-left ,1: line1-right ,2: line2-left ,3: line2-right ,4: line3-left ...),因此您将获得两倍于行数的数组。

String[] parts = str.split("\\|\\d+\\||\\n+");

I agree with the other answers that you should use split, but I am providing an answer that uses Pattern.split, since it uses a regex. 我同意您应该使用split的其他答案,但是我提供了一个使用Pattern.split的答案,因为它使用了正则表达式。

import java.util.*;
import java.lang.*;
import java.io.*;
import java.util.regex.Pattern;

/* Name of the class has to be "Main" only if the class is public. */
class MatchExample
{
    public static void main (String[] args) {
        String[] data = {
            "cats|1|short hair and long hair",
            "cats|2|black, blue",
            "dogs|1|cats are better than dogs"
        };
        Pattern p = Pattern.compile("\\|\\d+\\|");
        for(String line: data){

            String[] elements = p.split(line);
            System.out.println(elements[0] + " // " + elements[1]);

        }
    }
}

Notice that the pattern will match on one or more digits between two |'s. 请注意,模式将匹配两个|之间的一个或多个数字。 I see what you are doing with the groupings. 我知道您对这些分组的处理方式。

There is no need for a complex regex pattern, you could simple split the string by the demiliter token using the string's split method ( String#split() ) on Java. 有没有需要复杂的正则表达式,你可以简单的分割字符串使用字符串的拆分方法(demiliter 令牌 字符串#分裂() )上的Java。

Working Example 工作实例

public class StackOverFlow31840211 {
    private static final int SENTENCE1_TOKEN_INDEX = 0;
    private static final int DIGIT_TOKEN_INDEX = SENTENCE1_TOKEN_INDEX + 1;
    private static final int SENTENCE2_TOKEN_INDEX = DIGIT_TOKEN_INDEX + 1;

    public static void main(String[] args) {
        String[] text = {
            "cats|1|short hair and long hair",
            "cats|2|black, blue",
            "dogs|1|cats are better than dogs"
        };

        ArrayList<String> arrayOne = new ArrayList<String>();
        ArrayList<String> arrayTwo = new ArrayList<String>();

        for (String s : text) {
            String[] tokens = s.split("\\|");

            int tokenType = 0;
            for (String token : tokens) {
                switch (tokenType) {
                    case SENTENCE1_TOKEN_INDEX:
                        arrayOne.add(token);
                        break;

                    case SENTENCE2_TOKEN_INDEX:
                        arrayTwo.add(token);
                        break;
                }

                ++tokenType;
            }
        }

        System.out.println("Sentences for first token: " + arrayOne);
        System.out.println("Sentences for third token: " + arrayTwo);

    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM