在 Java 中解析 Content-Type 标头而不验证字符集

Question

Given an HTTP header like:给定一个 HTTP 标头，例如：

Content-Type: text/plain; charset=something

I'd like to extract the MIME type and charset using full RFC-compliant parsing, but without "validating" the charset.我想使用完全符合 RFC 的解析来提取 MIME 类型和字符集，但不“验证”字符集。 By validating, I mean that I don't want to use Java's internal Charset mechanism, in case the charset is unknown to Java (but may still have meaning for other applications).通过验证，我的意思是我不想使用 Java 的内部字符集机制，以防 Java 不知道字符集（但对于其他应用程序可能仍然有意义）。 The following code does not work because it does this validation:以下代码不起作用，因为它执行此验证：

import org.apache.http.entity.ContentType;

String header = "text/plain; charset=something";

ContentType contentType = ContentType.parse(header);
Charset contentTypeCharset = contentType.getCharset();

System.out.println(contentType.getMimeType());
System.out.println(contentTypeCharset == null ? null : contentTypeCharset.toString());

This throws java.nio.charset.UnsupportedCharsetException: something .这会抛出java.nio.charset.UnsupportedCharsetException: something 。

Answer 1

To do the parsing one can use lower-level parsing classes:要进行解析，可以使用较低级别的解析类：

import org.apache.http.HeaderElement;
import org.apache.http.NameValuePair;
import org.apache.http.message.BasicHeaderValueParser;

String header = "text/plain; charset=something";

HeaderElement headerElement = BasicHeaderValueParser.parseHeaderElement(header, null);
String mimeType = headerElement.getName();
String charset = null;
for (NameValuePair param : headerElement.getParameters()) {
    if (param.getName().equalsIgnoreCase("charset")) {
        String s = param.getValue();
        if (!StringUtils.isBlank(s)) {
            charset = s;
        }
        break;
    }
}

System.out.println(mimeType);
System.out.println(charset);

Answer 2

Alternatively one can still use the Apache's parse and catch the UnsupportedCharsetException for extracting the name using getCharsetName()或者，仍然可以使用Apache 的解析并捕获UnsupportedCharsetException以使用getCharsetName()提取名称

import org.apache.http.entity.ContentType;

String header = "text/plain; charset=something";

String charsetName;
String mimeType;

try {
  ContentType contentType = ContentType.parse(header); // here exception may be thrown
   mimeType = contentType.getMimeType();
   Charset charset = contentType.getCharset();
   charsetName = charset != null ? charset.name() : null;
} catch( UnsupportedCharsetException e) {
    charsetName = e.getCharsetName(); // extract unsupported charsetName
    mimeType = header.substring(0, header.indexOf(';')); // in case of exception, mimeType needs to be parsed separately
}

Drawback is that mimeType also needs to be extracted differently in case of UnsupportedCharsetException.缺点是在 UnsupportedCharsetException 的情况下还需要以不同的方式提取mimeType 。

在 Java 中解析 Content-Type 标头而不验证字符集

问题描述

2 个解决方案

解决方案1
1 2019-12-08 16:51:35

解决方案2
1 2019-12-08 17:12:11

在 Java 中解析 Content-Type 标头而不验证字符集

问题描述

2 个解决方案

解决方案1 1 2019-12-08 16:51:35

解决方案2 1 2019-12-08 17:12:11

解决方案1
1 2019-12-08 16:51:35

解决方案2
1 2019-12-08 17:12:11