简体   繁体   English

字节数组的最短字符串编码

[英]Shortest String encoding for a byte array

I have this code that generates UBJSON byte array 我有这段代码生成UBJSON字节数组

UBObject obj = UBValueFactory.createObject();
obj.put("appId", UBValueFactory.createString("70cce8adb93c4c968a7b1483f2edf5c1"));
obj.put("apiKey", UBValueFactory.createString("a65d8f147fa741b0a6d7fc43e18363c9"));
obj.put("entityType", UBValueFactory.createString("Todo"));
obj.put("entityId", UBValueFactory.createString("2-0"));
obj.put("blobName", UBValueFactory.createString("blobName"));

ByteArrayOutputStream out = new ByteArrayOutputStream();
UBWriter writer = new UBWriter(out);
try {
    writer.write(obj);
    writer.close();
} catch (IOException e) {
    e.printStackTrace();
}

// Byte array of UBJSON
byte[] ubjsonBytes = out.toByteArray();

The question is, what is the shortest String encoding that can be done for the byte array here, that can be used and transmitted over HTTP URL? 问题是,可以在HTTP URL上使用和传输的字节数组最短的String编码什么 Using Base64 works perfect as URL path or query parameter but yields quite long String. 使用Base64可以完美地用作URL路径或查询参数,但会产生相当长的String。

Depending on the input length and other properties you might want to try compressing the input with gzip before encoding the byte[] with Base64. 根据输入长度和其他属性,在使用Base64编码byte[]之前,您可能希望尝试使用gzip压缩输入。 Often a URL friendly variant of Base64 is used: 通常使用URL友好的Base64变体

For this reason, modified Base64 for URL variants exist (such as base64url in RFC 4648), where the + and / characters of standard Base64 are respectively replaced by - and _ , so that using URL encoders/decoders is no longer necessary and have no impact on the length of the encoded value, leaving the same encoded form intact for use in relational databases, web forms, and object identifiers in general. 因此,存在针对URL变体的经过修改的Base64(例如RFC 4648中的base64url),其中标准Base64的+/字符分别由-_替换,因此不再需要使用URL编码器/解码器,并且不再使用影响编码值的长度,使相同的编码形式完整地保留在关系数据库,Web表单和对象标识符中。

Some variants allow or require omitting the padding = signs to avoid them being confused with field separators, or require that any such padding be percent-encoded. 一些变体允许或要求省略padding =符号,以避免它们与字段分隔符混淆,或者要求对任何此类padding进行百分比编码。 Some libraries will encode = to . 一些库会将=编码为. , potentially exposing applications to relative path attacks when a folder name is encoded from user data. ,当从用户数据中编码文件夹名称时,可能会使应用程序遭受相对路径攻击。

You could attempt to use Base85 however it encodes with characters that can change the meaning of URL eg & . 您可以尝试使用Base85,但是它使用会更改URL含义的字符进行编码,例如& This might or might not work with your setup and might depend stuff like reverse proxy configuration. 这可能与您的设置兼容或不兼容,并且可能取决于诸如反向代理配置之类的内容。 Because of that it's often better to use a safe encoding like Base64. 因此,通常最好使用像Base64这样的安全编码。

All in all, long data should go into request body and not URL. 总而言之,长数据应该进入请求正文而不是URL。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM