简体   繁体   English

使用不同的正则表达式进行Java字符串解析以进行拆分

[英]Java string parsing with different regex to split

str="Tick for symbol .ISEQ-IDX descriptor id 1 timestamp_sec 20130628030105 timestamp_usec 384000;EXCH_TIME 1372388465384;SENDING_TIME  0;PRICE 3957.890000;MIC XDUBIND;"

I dont have any control on changing the format of how this string is created. 我对更改此字符串的创建格式没有任何控制权。

I tried this but I cant really get the values of first keys "Tick for symbol","timestamp_sec" etc. 我尝试了这个,但是我无法真正获得第一把键“ Tick for symbol”,“ timestamp_sec”等的值。

Not only in this specific string but I was curious about how to parse a string with multiple regex splits. 不仅在这个特定的字符串中,而且我很好奇如何解析具有多个正则表达式拆分的字符串。 Any help will be appreciated. 任何帮助将不胜感激。

   String[] s = line.split(";");
    Map<String, String> m = new HashMap<String, String>();
    for (int i = 0; i < s.length; i++)
    {
          String[] split = s[i].split("\\s+");
          for (String string2 : split)
          {
             //Adding key value pair. to a map for further usage. 
           m.put(split[0], split[1]);
          }

    }

Edit 编辑
Desired output into a map: 所需的输出到地图中:
(Tick for Symbol, .ISEQ-IDX) (对符号打勾,.ISEQ-IDX)
(descriptor id, 1) (描述符ID,1)
(timestamp_sec,20130628030105) (timestamp_sec,20130628030105)
(timestamp_usec,384000) (TIMESTAMP_USEC,384000)
(EXCH_TIME,1372388465384) (EXCH_TIME,1372388465384)
(SENDING_TIME,0) (SENDING_TIME,0)
(PRICE, 3957.890000) (价格,3957.890000)
(MIC, XDUBIND) (MIC,XDUBIND)

How about the following? 接下来呢? You specify a list of key-value pattern pairs. 您可以指定键/值模式对的列表。 Keys are specified directly as strings, values as regexes. 键直接指定为字符串,值指定为正则表达式。 Then you go thru this list and search the text for the key followed by the value pattern, if you find it you extract the value. 然后,您可以通过该列表并在文本中搜索键,然后搜索值模式,如果找到它,则提取值。

I assume the keys can be in any order, not all have to be present, there might be more than one space separating them. 我假设键可以以任何顺序排列,不必全部都存在,可能会有多个空格隔开它们。 If you know the order of the keys, you can always start find on the place where the previous find ended. 如果知道键的顺序,则始终find在上一个find结束的地方开始find If you know all keys are obligatory, you can throw an exception if you do not find what you look for. 如果您知道所有键都是必需的,那么如果找不到所需的键,则可以抛出异常。

    static String test="Tick for symbol .ISEQ-IDX descriptor id 1 timestamp_sec 20130628030105 timestamp_usec 384000;EXCH_TIME 1372388465384;SENDING_TIME  0;PRICE 3957.890000;MIC XDUBIND;";

    static List<String> patterns = Arrays.asList(
        "Tick for symbol", "\\S+",
        "descriptor id", "\\d+",
        "timestamp_sec", "\\d+",
        "timestamp_usec", "\\d+",
        "EXCH_TIME", "\\d+",
        "SENDING_TIME","\\d+",
        "PRICE", "\\d+.\\d",
        "MIC", "\\S+"
      );


        public static void main(String[] args) {
            Map<String,String> map = new HashMap<>();

            for (int i = 0; i<patterns.size();i+=2) {
                String key = patterns.get(i);
                String val = patterns.get(i+1);
                String pattern = "\\Q" +key + "\\E\\s+(" + val + ")";
                Matcher m = Pattern.compile(pattern).matcher(test);

                if (m.find()) {
                    map.put(key, m.group(1));
                }
            }
            System.out.println(map);

        }

I don't think a regex will help you here, whoever designed that output String clearly didn't have splitting in mind. 我认为正则表达式不会在这里为您提供帮助,无论谁设计输出String的人显然都没有想到。

I suggest simply parsing through the String with a loop and doing the whole thing manually. 我建议简单地通过一个循环解析String并手动完成整个操作。 Alternatively you can just look through the String for substrings (suck as "Tick for symbol"), then take whatever word comes after (until the next space), since the second parameter always seems to be one words. 或者,您可以只浏览String中的子字符串(以“ Tick for symbol”的形式吸吮),然后取其后的任何单词(直到下一个空格),因为第二个参数似乎总是一个单词。

Using the Pattern class from java.util.regex package, described step by step in this java Regex tutorial : 使用java.util.regex包中的Pattern类,在此java Regex教程中逐步介绍了该方法:

private static final Pattern splitPattern = Pattern.compile("^Tick for symbol (.*) descriptor id (\\d+) timestamp_sec (\\d+) timestamp_usec (\\d+);EXCH_TIME (\\d+);SENDING_TIME  ?(\\d+);PRICE (.*);MIC (\\w+);$");

private static String printExtracted(final String str) {
  final Matcher m = splitPattern.matcher(str);
  if (m.matches()) {
    final String tickForSymbol = m.group(1);
    final long descriptorId = Long.parseLong(m.group(2), 10);
    final long timestampSec = Long.parseLong(m.group(3), 10);
    final long timestampUsec = Long.parseLong(m.group(4), 10);
    final long exchTime = Long.parseLong(m.group(5), 10);
    final long sendingTime = Long.parseLong(m.group(6), 10);
    final double price = Double.parseDouble(m.group(7));
    final String mic = m.group(8);
    return "(Tick for Symbol, " + tickForSymbol + ")\n" +
         "(descriptor id, " + descriptorId + ")\n" +
         "(timestamp_sec, " + timestampSec + ")\n" +
         "(timestamp_usec, " + timestampUsec + ")\n" +
         "(EXCH_TIME, " + exchTime + ")\n" +
         "(SENDING_TIME, " + sendingTime +")\n" +
         "(PRICE, " + price + ")\n" +
         "(MIC, " + mic + ")";
  } else {
    throw new IllegalArgumentException("Argument " + str + " doesn't match pattern.");
  }
}

Edit : Using group instead of replaceAll as it makes more sense and and is also faster. 编辑 :使用group而不是replaceAll因为它更有意义,而且速度更快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM