Parsing a Content-Type header in Java without validating the charset

Question

Given an HTTP header like:

Content-Type: text/plain; charset=something

I'd like to extract the MIME type and charset using full RFC-compliant parsing, but without "validating" the charset. By validating, I mean that I don't want to use Java's internal Charset mechanism, in case the charset is unknown to Java (but may still have meaning for other applications). The following code does not work because it does this validation:

import org.apache.http.entity.ContentType;

String header = "text/plain; charset=something";

ContentType contentType = ContentType.parse(header);
Charset contentTypeCharset = contentType.getCharset();

System.out.println(contentType.getMimeType());
System.out.println(contentTypeCharset == null ? null : contentTypeCharset.toString());

This throws java.nio.charset.UnsupportedCharsetException: something .

Answer 1

To do the parsing one can use lower-level parsing classes:

import org.apache.http.HeaderElement;
import org.apache.http.NameValuePair;
import org.apache.http.message.BasicHeaderValueParser;

String header = "text/plain; charset=something";

HeaderElement headerElement = BasicHeaderValueParser.parseHeaderElement(header, null);
String mimeType = headerElement.getName();
String charset = null;
for (NameValuePair param : headerElement.getParameters()) {
    if (param.getName().equalsIgnoreCase("charset")) {
        String s = param.getValue();
        if (!StringUtils.isBlank(s)) {
            charset = s;
        }
        break;
    }
}

System.out.println(mimeType);
System.out.println(charset);

Answer 2

Alternatively one can still use the Apache's parse and catch the UnsupportedCharsetException for extracting the name using getCharsetName()

import org.apache.http.entity.ContentType;

String header = "text/plain; charset=something";

String charsetName;
String mimeType;

try {
  ContentType contentType = ContentType.parse(header); // here exception may be thrown
   mimeType = contentType.getMimeType();
   Charset charset = contentType.getCharset();
   charsetName = charset != null ? charset.name() : null;
} catch( UnsupportedCharsetException e) {
    charsetName = e.getCharsetName(); // extract unsupported charsetName
    mimeType = header.substring(0, header.indexOf(';')); // in case of exception, mimeType needs to be parsed separately
}

Drawback is that mimeType also needs to be extracted differently in case of UnsupportedCharsetException.

Parsing a Content-Type header in Java without validating the charset

Question

2 answers

solution1
1 2019-12-08 16:51:35

solution2
1 2019-12-08 17:12:11

Parsing a Content-Type header in Java without validating the charset

Question

2 answers

solution1 1 2019-12-08 16:51:35

solution2 1 2019-12-08 17:12:11

solution1
1 2019-12-08 16:51:35

solution2
1 2019-12-08 17:12:11