I have a service which returns back the data in the below format. I have shortened it down for understanding but in general this is pretty big response. Format is always going to be the same.
process=true
version=2
DataCenter=dc2
Total:2
prime:{0=1, 1=2, 2=3, 3=4, 4=1, 5=2}
obvious:{0=6, 1=7, 2=8, 3=5, 4=6}
mapping:{3=machineA.dc2.com, 2=machineB.dc2.com}
Machine:[machineA.dc2.com, machineB.dc2.com]
DataCenter=dc1
Total:2
prime:{0=1, 1=2, 2=3, 3=4, 4=1, 5=2, 6=3}
obvious:{0=6, 1=7, 2=8, 3=5, 4=6, 5=7}
mapping:{3=machineP.dc1.com, 2=machineQ.dc1.com}
Machine:[machineP.dc1.com, machineQ.dc1.com]
DataCenter=dc3
Total:2
prime:{0=1, 1=2, 2=3, 3=4, 4=1, 5=2}
obvious:{0=6, 1=7, 2=8, 3=5, 4=6}
mapping:{3=machineO.dc3.com, 2=machineR.dc3.com}
Machine:[machineO.dc3.com, machineR.dc3.com]
I am trying to parse the above data and store it in three different Maps.
Map<String, Map<Integer, Integer>> prime = new HashMap<String, Map<Integer, Integer>>();
Map<String, Map<Integer, Integer>> obvious = new HashMap<String, Map<Integer, Integer>>();
Map<String, Map<Integer, String>> mapping = new HashMap<String, Map<Integer, String>>();
Below is the description:
dc2
and the value will be {0=1, 1=2, 2=3, 3=4, 4=1, 5=2}
. dc2
and the value will be {0=6, 1=7, 2=8, 3=5, 4=6}
. dc2
and the value will be {3=machineA.dc2.com, 2=machineB.dc2.com}
. Similarly for other datacenters as well.
What is the best way to parse the above string response? Should I use regex here or simple string parsing?
public class DataParser {
public static void main(String[] args) {
String response = getDataFromURL();
// here response will contain above string
parseResponse(response);
}
private void parseResponse(final String response) {
// what is the best way to parse the response?
}
}
Any example will be of great help.
You can do like ShellFish recommends and split the response by '\\n' and then process each line.
One regex approach would be like the following (It's incomplete, but is enough to get you started):
public static void main(String[] args) throws Exception {
String response = "process=true\n" +
"version=2\n" +
"DataCenter=dc2\n" +
" Total:2\n" +
" prime:{0=1, 1=2, 2=3, 3=4, 4=1, 5=2}\n" +
" obvious:{0=6, 1=7, 2=8, 3=5, 4=6}\n" +
" mapping:{3=machineA.dc2.com, 2=machineB.dc2.com}\n" +
" Machine:[machineA.dc2.com, machineB.dc2.com]\n" +
"DataCenter=dc1\n" +
" Total:2\n" +
" prime:{0=1, 1=2, 2=3, 3=4, 4=1, 5=2, 6=3}\n" +
" obvious:{0=6, 1=7, 2=8, 3=5, 4=6, 5=7}\n" +
" mapping:{3=machineP.dc1.com, 2=machineQ.dc1.com}\n" +
" Machine:[machineP.dc1.com, machineQ.dc1.com]\n" +
"DataCenter=dc3\n" +
" Total:2\n" +
" prime:{0=1, 1=2, 2=3, 3=4, 4=1, 5=2}\n" +
" obvious:{0=6, 1=7, 2=8, 3=5, 4=6}\n" +
" mapping:{3=machineO.dc3.com, 2=machineR.dc3.com}\n" +
" Machine:[machineO.dc3.com, machineR.dc3.com]";
Map<String, Map<Integer, Integer>> prime = new HashMap();
Map<String, Map<Integer, Integer>> obvious = new HashMap();
Map<String, Map<Integer, String>> mapping = new HashMap();
String outerMapKey = "";
int findCount = 0;
Matcher matcher = Pattern.compile("(?<=DataCenter=)(.*)|(?<=prime:)(.*)|(?<=obvious:)(.*)|(?<=mapping:)(.*)").matcher(response);
while(matcher.find()) {
switch (findCount) {
case 0:
outerMapKey = matcher.group();
break;
case 1:
prime.put(outerMapKey, new HashMap());
String group = matcher.group().replaceAll("[\\{\\}]", "").replaceAll(", ", ",");
String[] groupPieces = group.split(",");
for (String groupPiece : groupPieces) {
String[] keyValue = groupPiece.split("=");
prime.get(outerMapKey).put(Integer.parseInt(keyValue[0]), Integer.parseInt(keyValue[0]));
}
break;
// Add additional cases for obvious and mapping
}
findCount++;
if (findCount == 4) {
findCount = 0;
}
}
System.out.println("Primes:");
prime.keySet().stream().forEach(k -> System.out.printf("Key: %s Value: %s\n", k, prime.get(k)));
// Add additional outputs for obvious and mapping
}
Results:
Primes:
Key: dc2 Value: {0=0, 1=1, 2=2, 3=3, 4=4, 5=5}
Key: dc1 Value: {0=0, 1=1, 2=2, 3=3, 4=4, 5=5, 6=6}
Key: dc3 Value: {0=0, 1=1, 2=2, 3=3, 4=4, 5=5}
References to explain the regex pattern: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
The answer depends on how much you trust the format to be be fixed and exact. A very simple approach parses the string and does minimal string compare to determine the key value:
private static final String DATA_CENTER = "DataCenter=";
private static final int DATA_CENTER_LEN = DATA_CENTER.length();
private static final String PRIME = " prime:";
private static final int PRIME_LEN = PRIME.length();
// etc.
Map<String, Map<Integer, Integer>> prime = new HashMap<>();
// etc.
String response = "...";
Scanner scanner = new Scanner( response );
while(scanner.hasNextLine()){
String line = scanner.nextLine();
if( line.startsWith( DATA_CENTER ) ){
String dc = line.substring( DATA_CENTER_LEN );
line = scanner.nextLine(); // skip Total
prime.put( dc, str2map(scanner.nextLine().substring(PRIME_LEN)) );
obvious.put( dc, str2map(scanner.nextLine().substring(OBVIOUS_LEN)) );
mapping.put( dc, str2mapis(scanner.nextLine().substring(MAPPING_LEN)) );
}
}
More explicit nextLine() calls would avoid even the test for "DataCenter".
Here's a couple of almost identical methods to split the braces and create a map:
private static Map<Integer,Integer> str2map( String str ){
Map<Integer,Integer> map = new HashMap<>();
str = str.substring( 1, str.length()-1 );
String[] pairs = str.split( ", " );
for( String pair: pairs ){
String[] kv = pair.split( "=" );
map.put( Integer.parseInt(kv[0]),Integer.parseInt(kv[1]) );
}
return map;
}
private static Map<Integer,String> str2mapis( String str ){
Map<Integer,String> map = new HashMap<>();
//...
map.put( Integer.parseInt(kv[0]),kv[1] );
}
return map;
}
If there's the possibility that the white space might vary, you could stay on the safe side, using
private static final String PRIME = "prime:";
// ...
prime.put( dc, str2map(scanner.nextLine().trim().substring( PRIME_LEN )) );
If the sequence or completeness of lines isn't guaranteed, testing may be required:
line = scanner.nextLine().trim();
if( line.startsWith( PRIME ) ){
prime.put( dc, str2map(scanner.nextLine().substring( PRIME_LEN )) );
}
With even less stability/trust regular expression parsing might be indicated.
I would do simple string parsing in this case, applying regex for each line. In pseudo code, something like this:
for line in response
if line matches /^DataCenter/
key = datacenter name
else if line matches / *prime/
prime.put(key, prime value)
else if line matches / *obvious/
obvious.put(key, obvious value)
else if line matches / *mapping/
mapping.put(key, mapping value)
else
getline
You could optimize here by first checking the first char of the line. If it's anything besides a space or a D
, you can go to the next line. If the format is always the same you could even hardcode the lines to parse. In the example you supplied you could do:
skip 2 lines
repeat
extract datacenter name
skip 1 line
extract prime
extract obvious
extract mapping
add above stuff to the maps
skip 1 line
until EOF
This will be a lot faster but will fail if the format changes.
You could use a Parser Generator such as ANTLR, or you could hand code the parser. Depending on how much output you have to process and how often, you may find that going to such trouble isn't really worth it, and that just going over each line and manually parsing it (eg, regex or indexOf) is sufficient and clear enough.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.