简体   繁体   中英

Interpreting ASN.1 indefinite-lenght encoding with multiple encapsulated octet-strings

I have a BER structure like this...

$ openssl asn1parse -inform der -in test.der -i -dump

 ????:d=4  hl=2 l=inf  cons:     cont [ 0 ]
 ????:d=5  hl=3 l= 240 prim:      OCTET STRING
      0000 - AABBCCDD
 ????:d=5  hl=2 l=   8 prim:      OCTET STRING
      0000 - EEFF
 ????:d=5  hl=2 l=   0 prim:      EOC

...or in der2ascii style...

[0] `80`
  OCTET_STRING { `AABBCCDD` }
  OCTET_STRING { `EEFF` }
`0000`

What I know: indefinite-length encoding must contain a constructed type, because primitive types may introduce ambiguities, eg when containing 0x0000. What I want to know: How does a decoder must behave when parsing this BER structure? Are the header bytes of both OCTET STRINGs included in the encoding? If yes, how is indefinite-length byte data encoded? How does an application interpret the value of the TLV field tagged [0], when the second OCTET STRING is eg an INTEGER?

I am asking this question, because in the CMS standard, a field is defined as single OCTET STRING, but in most BER encodings I always see two of them. Is this only due to the indefinite-length encoding? Am I missing something?

From ITU-T X.690:

8.1.4 Contents octets

The contents octets shall consist of zero, one or more octets, and shall encode the data value as specified in subsequent clauses.

NOTE – The contents octets depend on the type of the data value; subsequent clauses follow the same sequence as the definition of types in ASN.1.

Does this mean, that I can put every constructed type and the application must only interpret the value part of the contructed TLV structure?

When you encode a primitive OCTET STRING in indefinite length mode, the encoder must:

  • split up the value into chunks of smaller OCTET STRINGs
  • encode each chunk in definite length mode so that each has its own TLV (with length!)
  • the whole sequence of definite length encoded primitive OCTET STRINGs must be framed by a single, indefinite length encoded constructed OCTET STRING "container" having its own TLV (without length, but with end-of-octets sentinel)

At the other end, the decoder extracts the V part from the inner, definite length OCTET STRING chunks (dropping their TL headers). Then joins/consumes V's together in the order of arrival dropping the TL part of the outer frame.

Note that the idea behind indefinite length encoding technique is that both encoder and decoder can emit/consume incomplete, possibly oversized, data.

Chunk size is chosen by the encoder/application based on data availability, memory situation and possibly the estimation of decoder's buffering capabilities. I think this is mentioned somewhere in the X.280/X.680 papers.

Encoder is not allowed to put chunks of different ASN.1 types into any single indefinite length encoded container. In other words, all chunks must be of the same type as the outer container.

That should hopefully explain why you may see multiple (depending on chunk size) OCTET STRINGs in the indefinite length encoded BER/CER stream where just a single OCTET STRING is expected.

DER forbids indefinite length encoding on the grounds that serialized representation of the same data may change on re-encoding (due to potentially changing chunk size).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM