简体   繁体   中英

How can I encode Azure storage table row keys and partition keys?

I'm using Azure storage tables and I have data going in to the RowKey that has slashes in it. According to this MSDN page , the following characters are disallowed in both the PartitionKey and RowKey:

  • The forward slash (/) character

  • The backslash () character

  • The number sign (#) character

  • The question mark (?) character

  • Control characters from U+0000 to U+001F, including:

  • The horizontal tab (\\t) character

  • The linefeed (\\n) character

  • The carriage return (\\r) character

  • Control characters from U+007F to U+009F

I've seen some people use URL encoding to get around this. Unfortunately there's a few glitches that can arise from this, such as being able to insert but unable to delete certain entities. I've also seen some people use base64 encoding, however this also can contain disallowed characters.

How can I encode my RowKey efficiently without running in to disallowed characters, or rolling my own encoding?

Updated 18-Aug-2020 for (new?) issue with '+' character in Azure Search. See comments from @mladenb below for background. Of note, the documentation page referenced does not exclude the '+' character.

When a URL is Base64 encoded, the only character that is invalid in an Azure Table Storage key column is the forward slash ('/'). To address this, simply replace the forward slash character with another character that is both (1) valid in an Azure Table Storage key column and (2) not a Base64 character. The most common example I have found (which is cited in other answers) is to replace the forward slash ('/') with the underscore ('_').

private static String EncodeUrlInKey(String url)
{
    var keyBytes = System.Text.Encoding.UTF8.GetBytes(url);
    var base64 = System.Convert.ToBase64String(keyBytes);
    return base64.Replace('/','_').Replace('+','-');
}

When decoding, simply undo the replaced character (first!) and then Base64 decode the resulting string. That's all there is to it.

private static String DecodeUrlInKey(String encodedKey)
{
    var base64 = encodedKey.Replace('-','+').Replace('_', '/');
    byte[] bytes = System.Convert.FromBase64String(base64);
    return System.Text.Encoding.UTF8.GetString(bytes);
}

Some people have suggested that other Base64 characters also need encoding. According to the Azure Table Storage docs this is not the case.

I ran into the same need.

I wasn't satisfied with Base64 encoding because it turns a human-readable string into an unrecognizable string, and will inflate the size of strings regardless of whether they follow the rules (a loss when the great majority of characters are not illegal characters that need to be escaped).

Here's a coder/decoder using '!'as an escape character in much the same way one would traditionally use the backslash character.

public static class TableKeyEncoding
{
    // https://msdn.microsoft.com/library/azure/dd179338.aspx
    // 
    // The following characters are not allowed in values for the PartitionKey and RowKey properties:
    // The forward slash(/) character
    // The backslash(\) character
    // The number sign(#) character
    // The question mark (?) character
    // Control characters from U+0000 to U+001F, including:
    // The horizontal tab(\t) character
    // The linefeed(\n) character
    // The carriage return (\r) character
    // Control characters from U+007F to U+009F
    public static string Encode(string unsafeForUseAsAKey)
    {
        StringBuilder safe = new StringBuilder();
        foreach (char c in unsafeForUseAsAKey)
        {
            switch (c)
            {
                case '/':
                    safe.Append("!f");
                    break;
                case '\\':
                    safe.Append("!b");
                    break;
                case '#':
                    safe.Append("!p");
                    break;
                case '?':
                    safe.Append("!q");
                    break;
                case '\t':
                    safe.Append("!t");
                    break;
                case '\n':
                    safe.Append("!n");
                    break;
                case '\r':
                    safe.Append("!r");
                    break;
                case '!':
                    safe.Append("!!");
                    break;
                default:
                    if (c <= 0x1f || (c >= 0x7f && c <= 0x9f))
                    {
                        int charCode = c;
                        safe.Append("!x" + charCode.ToString("x2"));
                    }
                    else
                    {
                        safe.Append(c);
                    }
                    break;
            }
        }
        return safe.ToString();
    }

    public static string Decode(string key)
    {
        StringBuilder decoded = new StringBuilder();
        int i = 0;
        while (i < key.Length)
        {
            char c = key[i++];
            if (c != '!' || i == key.Length)
            {
                // There's no escape character ('!'), or the escape should be ignored because it's the end of the array
                decoded.Append(c);
            }
            else
            {
                char escapeCode = key[i++];
                switch (escapeCode)
                {
                    case 'f':
                        decoded.Append('/');
                        break;
                    case 'b':
                        decoded.Append('\\');
                        break;
                    case 'p':
                        decoded.Append('#');
                        break;
                    case 'q':
                        decoded.Append('?');
                        break;
                    case 't':
                        decoded.Append('\t');
                        break;
                    case 'n':
                        decoded.Append("\n");
                        break;
                    case 'r':
                        decoded.Append("\r");
                        break;
                    case '!':
                        decoded.Append('!');
                        break;
                    case 'x':
                        if (i + 2 <= key.Length)
                        {
                            string charCodeString = key.Substring(i, 2);
                            int charCode;
                            if (int.TryParse(charCodeString, NumberStyles.HexNumber, NumberFormatInfo.InvariantInfo, out charCode))
                            {
                                decoded.Append((char)charCode);
                            }
                            i += 2;
                        }
                        break;
                    default:
                        decoded.Append('!');
                        break;
                }
            }
        }
        return decoded.ToString();
    }
}

Since one should use extreme caution when writing your own encoder, I have written some unit tests for it as well.

using Xunit;

namespace xUnit_Tests
{
    public class TableKeyEncodingTests
    {
        const char Unicode0X1A = (char) 0x1a;


        public void RoundTripTest(string unencoded, string encoded)
        {
            Assert.Equal(encoded, TableKeyEncoding.Encode(unencoded));
            Assert.Equal(unencoded, TableKeyEncoding.Decode(encoded));
        }

        [Fact]
        public void RoundTrips()
        {
            RoundTripTest("!\n", "!!!n");
            RoundTripTest("left" + Unicode0X1A + "right", "left!x1aright");
        }


        // The following characters are not allowed in values for the PartitionKey and RowKey properties:
        // The forward slash(/) character
        // The backslash(\) character
        // The number sign(#) character
        // The question mark (?) character
        // Control characters from U+0000 to U+001F, including:
        // The horizontal tab(\t) character
        // The linefeed(\n) character
        // The carriage return (\r) character
        // Control characters from U+007F to U+009F
        [Fact]
        void EncodesAllForbiddenCharacters()
        {
            List<char> forbiddenCharacters = "\\/#?\t\n\r".ToCharArray().ToList();
            forbiddenCharacters.AddRange(Enumerable.Range(0x00, 1+(0x1f-0x00)).Select(i => (char)i));
            forbiddenCharacters.AddRange(Enumerable.Range(0x7f, 1+(0x9f-0x7f)).Select(i => (char)i));
            string allForbiddenCharacters = String.Join("", forbiddenCharacters);
            string allForbiddenCharactersEncoded = TableKeyEncoding.Encode(allForbiddenCharacters);

            // Make sure decoding is same as encoding
            Assert.Equal(allForbiddenCharacters, TableKeyEncoding.Decode(allForbiddenCharactersEncoded));

            // Ensure encoding does not contain any forbidden characters
            Assert.Equal(0, allForbiddenCharacters.Count( c => allForbiddenCharactersEncoded.Contains(c) ));
        }

    }
}

How about URL encode/decode functions. It takes care of '/' , '?' and '#' characters.

string url = "http://www.google.com/search?q=Example";
string key = HttpUtility.UrlEncode(url);
string urlBack = HttpUtility.UrlDecode(key);

see these links http://tools.ietf.org/html/rfc4648#page-7 Code for decoding/encoding a modified base64 URL (see also second answer: https://stackoverflow.com/a/1789179/1094268 )

I had the problem myself. These are my own functions I use for this now. I use the trick in the second answer I mentioned, as well as changing up the + and / which are incompatible with azure keys that may still appear.

private static String EncodeSafeBase64(String toEncode)
{
    if (toEncode == null)
        throw new ArgumentNullException("toEncode");
    String base64String = Convert.ToBase64String(Encoding.UTF8.GetBytes(toEncode));
    StringBuilder safe = new StringBuilder();
    foreach (Char c in base64String)
    {
        switch (c)
        {
            case '+':
                safe.Append('-');
                break;
            case '/':
                safe.Append('_');
                break;
            default:
                safe.Append(c);
                break;
        }
    }
    return safe.ToString();
}

private static String DecodeSafeBase64(String toDecode)
{
    if (toDecode == null)
        throw new ArgumentNullException("toDecode");
    StringBuilder deSafe = new StringBuilder();
    foreach (Char c in toDecode)
    {
        switch (c)
        {
            case '-':
                deSafe.Append('+');
                break;
            case '_':
                deSafe.Append('/');
                break;
            default:
                deSafe.Append(c);
                break;
        }
    }
    return Encoding.UTF8.GetString(Convert.FromBase64String(deSafe.ToString()));
}

If it is just the slashes, you can simply replace them on writing to the table with another character, say, '|' and re-replace them on reading.

What I have seen is that although alot of non-alphanumeric characters are technically allowed it doesn't really work very well as partition and row key.

I looked at the answears already given here and other places and wrote this: https://github.com/JohanNorberg/AlphaNumeric

Two alpha-numeric encoders.

If you need to escape a string that is mostly alphanumeric you can use this:

AlphaNumeric.English.Encode(str);

If you need to escape a string that is mostly not alphanumeric you can use this:

AlphaNumeric.Data.EncodeString(str);

Encoding data:

var base64 = Convert.ToBase64String(bytes);
var alphaNumericEncodedString = base64
            .Replace("0", "01")
            .Replace("+", "02")
            .Replace("/", "03")
            .Replace("=", "04");

But, if you want to use for example an email adress as a rowkey you would only want to escape the '@' and '.'. This code will do that:

        char[] validChars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ3456789".ToCharArray();
        char[] allChars = rawString.ToCharArray();
        StringBuilder builder = new StringBuilder(rawString.Length * 2);
        for(int i = 0; i < allChars.Length; i++)
        {
            int c = allChars[i];
            if((c >= 51 && c <= 57) || (c >= 65 && c <= 90) || (c >= 97 && c <= 122))
            {
                builder.Append(allChars[i]);
            } 
            else
            {
                int index = builder.Length;
                int count = 0;
                do
                {
                    builder.Append(validChars[c % 59]);
                    c /= 59;
                    count++;
                } while (c > 0);

                if (count == 1) builder.Insert(index, '0');
                else if (count == 2) builder.Insert(index, '1');
                else if (count == 3) builder.Insert(index, '2');
                else throw new Exception("Base59 has invalid count, method must be wrong Count is: " + count);
            }
        }

        return builder.ToString(); 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM