[英]Parsing a Content-Type header in Java without validating the charset
Given an HTTP header like:给定一个 HTTP 标头,例如:
Content-Type: text/plain; charset=something
I'd like to extract the MIME type and charset using full RFC-compliant parsing, but without "validating" the charset.我想使用完全符合 RFC 的解析来提取 MIME 类型和字符集,但不“验证”字符集。 By validating, I mean that I don't want to use Java's internal Charset mechanism, in case the charset is unknown to Java (but may still have meaning for other applications).
通过验证,我的意思是我不想使用 Java 的内部字符集机制,以防 Java 不知道字符集(但对于其他应用程序可能仍然有意义)。 The following code does not work because it does this validation:
以下代码不起作用,因为它执行此验证:
import org.apache.http.entity.ContentType;
String header = "text/plain; charset=something";
ContentType contentType = ContentType.parse(header);
Charset contentTypeCharset = contentType.getCharset();
System.out.println(contentType.getMimeType());
System.out.println(contentTypeCharset == null ? null : contentTypeCharset.toString());
This throws java.nio.charset.UnsupportedCharsetException: something
.这会抛出
java.nio.charset.UnsupportedCharsetException: something
。
To do the parsing one can use lower-level parsing classes:要进行解析,可以使用较低级别的解析类:
import org.apache.http.HeaderElement;
import org.apache.http.NameValuePair;
import org.apache.http.message.BasicHeaderValueParser;
String header = "text/plain; charset=something";
HeaderElement headerElement = BasicHeaderValueParser.parseHeaderElement(header, null);
String mimeType = headerElement.getName();
String charset = null;
for (NameValuePair param : headerElement.getParameters()) {
if (param.getName().equalsIgnoreCase("charset")) {
String s = param.getValue();
if (!StringUtils.isBlank(s)) {
charset = s;
}
break;
}
}
System.out.println(mimeType);
System.out.println(charset);
Alternatively one can still use the Apache's parse and catch the UnsupportedCharsetException
for extracting the name using getCharsetName()或者,仍然可以使用Apache 的解析并捕获
UnsupportedCharsetException
以使用getCharsetName()提取名称
import org.apache.http.entity.ContentType;
String header = "text/plain; charset=something";
String charsetName;
String mimeType;
try {
ContentType contentType = ContentType.parse(header); // here exception may be thrown
mimeType = contentType.getMimeType();
Charset charset = contentType.getCharset();
charsetName = charset != null ? charset.name() : null;
} catch( UnsupportedCharsetException e) {
charsetName = e.getCharsetName(); // extract unsupported charsetName
mimeType = header.substring(0, header.indexOf(';')); // in case of exception, mimeType needs to be parsed separately
}
Drawback is that mimeType
also needs to be extracted differently in case of UnsupportedCharsetException.缺点是在 UnsupportedCharsetException 的情况下还需要以不同的方式提取
mimeType
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.