How would you parse this TLV in Java?

Question

I have a mobile app that I've written for iPhone (Objective-C) that allows users to import data using a specific format. I have the same app written for Android in Java and I've had users start asking for the ability to import. The format of the data is a portable standard that folks who write apps like this have to be able to import and export.

While I did write what I'm about to ask in Objective-C, I have a feel that I could have made my life quite a bit easier by doing it a different way. So, I'd like to ask how you'd parse the following TLV in Java. I don't need code, just the gist.

Here's the TLV format:

<Type:Length>Value<Type:Length>Value<Type:Length>Value<end>

Each record starts with < and ends with <end> . \n within records is acceptable and zero length values are okay.

Here's an example input describing four different cars, note the multi-line record and the zero length value.

<make:4>ford<model:7>contour<color:3>red<end>
<make:5>mazda<model:3>mpv<color:5>black<end>
<make:3>bmw
<model:3>335
<color:6>yellow
<end>
<make:7>unknown<model:0><color:4>grey<end>

Once the data is parsed, I'll be inserting it into an SQLite DB so ultimately looping the data by each record will result in a bunch of strings that I can use as part of the INSERT statement.

Thanks for any ideas you can provide!

Nick

Answer 1

Very strange format. Is there a published specification?

You can try doing the string tokenization route. You could leverage the built-in Java regex to help with the matching, or even just use basic String class methods (split and trim being your friend). Basically just do:

String[] lines = input.split("<end>");
for(String line : lines)
{
    line = line.trim();
    String[] sublines = line.split("<");
    for(String subline : sublines)
    {
        subline = subline.trim();
        ...additional breaking, trimming, branching...
    }
}

The type length is an interesting validation component, but is a little odd for a modern language. One BIG question I would ask would be what encoding[s] to expect. UTF-8? 7-bit ASCII? Something strange?

My friends would call the pseudo-code above a hack and tell me to do something like JavaCC , but I have nerdy and impractical friends. ;)

Answer 2

If the input file isn't going to be too large you can read it all into a String then split the string into an array based on <end> as a delimiter. Then iterate over the array using regex to capture each Type and corresponding Value .

Answer 3

The xmlishness of the format is somewhat confusing. The Length is the length of token right? I guess I would use the following algo:

next_record:
while (! eof) {
  read token between '<' and '>'
  if (token == "end") {
     continue next_record
  }
  split token into type and length
  read length number of characters into value
  add tuplee (type, length, value) to collection
}

How would you parse this TLV in Java?

Question

3 answers

solution1
1 ACCPTED 2011-07-09 06:19:23

solution2
0 2011-07-09 05:25:35

solution3
0 2011-07-09 05:40:30

How would you parse this TLV in Java?

Question

3 answers

solution1 1 ACCPTED 2011-07-09 06:19:23

solution2 0 2011-07-09 05:25:35

solution3 0 2011-07-09 05:40:30

solution1
1 ACCPTED 2011-07-09 06:19:23

solution2
0 2011-07-09 05:25:35

solution3
0 2011-07-09 05:40:30