简体   繁体   中英

Convert UTF-8 string to alphanumeric string without information loss

I want to use Jake Wharton's DiskLruCache for Android to cache CouchDb documents on disk. CouchDb ids are just any JSON String, so could look Sömething/Like/Thís . However, the library's docs state

Each cache entry has a string key and a fixed number of values. Each key must match the regex [a-z0-9_-]{1,64} .

So I need a way to transform an arbitrary strings to conform to the regex [a-z0-9_-]{1,64} , while still being unique. How can I do this elegantly?

How about calculating a 64 character hash of the original JSON String and using this hash as a key for the cache?

But, this would not be guaranteed to be unique. But then again, mapping any JSON String to *[a-z0-9_-]{1,64}* will never be anyways.

From this question : you can convert the original string to a string representation of the hexidecimal representation of its bytes.

public String toHex(String arg) {
    return String.format("%040x", new BigInteger(1, arg.getBytes("UTF-8")));
}

Although this might produce a String significantly larger than the previous, and may overflow 64-character limit.

Using a hash is not feasible if you want to reverse it. Base 64 doesn't match your requirements, but you can try something similar:

Encode each character using only [a-z0-9_-], or more exactly, if the character doesn't match [a-z0-9_], replace it by its unicode value preceeded by -.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM