简体   繁体   English

如何对 Azure 存储表行键和分区键进行编码?

[英]How can I encode Azure storage table row keys and partition keys?

I'm using Azure storage tables and I have data going in to the RowKey that has slashes in it.我正在使用 Azure 存储表,并且我有数据进入其中有斜线的 RowKey。 According to this MSDN page , the following characters are disallowed in both the PartitionKey and RowKey:根据this MSDN page , PartitionKey 和 RowKey 中都不允许使用以下字符:

  • The forward slash (/) character正斜杠 (/) 字符

  • The backslash () character反斜杠 () 字符

  • The number sign (#) character数字符号 (#) 字符

  • The question mark (?) character问号 (?) 字符

  • Control characters from U+0000 to U+001F, including:从 U+0000 到 U+001F 的控制字符,包括:

  • The horizontal tab (\\t) character水平制表符 (\\t) 字符

  • The linefeed (\\n) character换行 (\\n) 字符

  • The carriage return (\\r) character回车 (\\r) 字符

  • Control characters from U+007F to U+009F控制字符从 U+007F 到 U+009F

I've seen some people use URL encoding to get around this.我见过一些人使用 URL 编码来解决这个问题。 Unfortunately there's a few glitches that can arise from this, such as being able to insert but unable to delete certain entities.不幸的是,这可能会导致一些小故障,例如能够插入但无法删除某些实体。 I've also seen some people use base64 encoding, however this also can contain disallowed characters.我也看到有些人使用 base64 编码,但是这也可能包含不允许的字符。

How can I encode my RowKey efficiently without running in to disallowed characters, or rolling my own encoding?如何有效地对我的 RowKey 进行编码而不会遇到不允许的字符或滚动我自己的编码?

Updated 18-Aug-2020 for (new?) issue with '+' character in Azure Search. 2020 年 8 月 18 日更新了 Azure 搜索中“+”字符的(新?)问题。 See comments from @mladenb below for background.有关背景,请参阅下面来自@mladenb 的评论。 Of note, the documentation page referenced does not exclude the '+' character.值得注意的是,引用的文档页面不排除“+”字符。

When a URL is Base64 encoded, the only character that is invalid in an Azure Table Storage key column is the forward slash ('/').当 URL 为 Base64 编码时,Azure 表存储键列中唯一无效的字符是正斜杠 ('/')。 To address this, simply replace the forward slash character with another character that is both (1) valid in an Azure Table Storage key column and (2) not a Base64 character.要解决此问题,只需将正斜杠字符替换为 (1) 在 Azure 表存储键列中有效且 (2) 不是 Base64 字符的另一个字符。 The most common example I have found (which is cited in other answers) is to replace the forward slash ('/') with the underscore ('_').我发现的最常见的例子(在其他答案中引用)是用下划线('_')替换正斜杠('/')。

private static String EncodeUrlInKey(String url)
{
    var keyBytes = System.Text.Encoding.UTF8.GetBytes(url);
    var base64 = System.Convert.ToBase64String(keyBytes);
    return base64.Replace('/','_').Replace('+','-');
}

When decoding, simply undo the replaced character (first!) and then Base64 decode the resulting string.解码时,只需撤消替换的字符(首先!),然后 Base64 解码结果字符串。 That's all there is to it.这里的所有都是它的。

private static String DecodeUrlInKey(String encodedKey)
{
    var base64 = encodedKey.Replace('-','+').Replace('_', '/');
    byte[] bytes = System.Convert.FromBase64String(base64);
    return System.Text.Encoding.UTF8.GetString(bytes);
}

Some people have suggested that other Base64 characters also need encoding.有人建议其他 Base64 字符也需要编码。 According to the Azure Table Storage docs this is not the case.根据Azure 表存储文档,情况并非如此。

I ran into the same need.我遇到了同样的需求。

I wasn't satisfied with Base64 encoding because it turns a human-readable string into an unrecognizable string, and will inflate the size of strings regardless of whether they follow the rules (a loss when the great majority of characters are not illegal characters that need to be escaped).我对 Base64 编码不满意,因为它将人类可读的字符串变成了无法识别的字符串,并且无论字符串是否遵循规则都会夸大字符串的大小(当绝大多数字符不是需要的非法字符时,这是一种损失)逃脱)。

Here's a coder/decoder using '!'这是一个使用 '!' 的编码器/解码器as an escape character in much the same way one would traditionally use the backslash character.作为转义字符的方式与传统上使用反斜杠字符的方式大致相同。

public static class TableKeyEncoding
{
    // https://msdn.microsoft.com/library/azure/dd179338.aspx
    // 
    // The following characters are not allowed in values for the PartitionKey and RowKey properties:
    // The forward slash(/) character
    // The backslash(\) character
    // The number sign(#) character
    // The question mark (?) character
    // Control characters from U+0000 to U+001F, including:
    // The horizontal tab(\t) character
    // The linefeed(\n) character
    // The carriage return (\r) character
    // Control characters from U+007F to U+009F
    public static string Encode(string unsafeForUseAsAKey)
    {
        StringBuilder safe = new StringBuilder();
        foreach (char c in unsafeForUseAsAKey)
        {
            switch (c)
            {
                case '/':
                    safe.Append("!f");
                    break;
                case '\\':
                    safe.Append("!b");
                    break;
                case '#':
                    safe.Append("!p");
                    break;
                case '?':
                    safe.Append("!q");
                    break;
                case '\t':
                    safe.Append("!t");
                    break;
                case '\n':
                    safe.Append("!n");
                    break;
                case '\r':
                    safe.Append("!r");
                    break;
                case '!':
                    safe.Append("!!");
                    break;
                default:
                    if (c <= 0x1f || (c >= 0x7f && c <= 0x9f))
                    {
                        int charCode = c;
                        safe.Append("!x" + charCode.ToString("x2"));
                    }
                    else
                    {
                        safe.Append(c);
                    }
                    break;
            }
        }
        return safe.ToString();
    }

    public static string Decode(string key)
    {
        StringBuilder decoded = new StringBuilder();
        int i = 0;
        while (i < key.Length)
        {
            char c = key[i++];
            if (c != '!' || i == key.Length)
            {
                // There's no escape character ('!'), or the escape should be ignored because it's the end of the array
                decoded.Append(c);
            }
            else
            {
                char escapeCode = key[i++];
                switch (escapeCode)
                {
                    case 'f':
                        decoded.Append('/');
                        break;
                    case 'b':
                        decoded.Append('\\');
                        break;
                    case 'p':
                        decoded.Append('#');
                        break;
                    case 'q':
                        decoded.Append('?');
                        break;
                    case 't':
                        decoded.Append('\t');
                        break;
                    case 'n':
                        decoded.Append("\n");
                        break;
                    case 'r':
                        decoded.Append("\r");
                        break;
                    case '!':
                        decoded.Append('!');
                        break;
                    case 'x':
                        if (i + 2 <= key.Length)
                        {
                            string charCodeString = key.Substring(i, 2);
                            int charCode;
                            if (int.TryParse(charCodeString, NumberStyles.HexNumber, NumberFormatInfo.InvariantInfo, out charCode))
                            {
                                decoded.Append((char)charCode);
                            }
                            i += 2;
                        }
                        break;
                    default:
                        decoded.Append('!');
                        break;
                }
            }
        }
        return decoded.ToString();
    }
}

Since one should use extreme caution when writing your own encoder, I have written some unit tests for it as well.由于在编写自己的编码器时应格外小心,因此我也为它编写了一些单元测试。

using Xunit;

namespace xUnit_Tests
{
    public class TableKeyEncodingTests
    {
        const char Unicode0X1A = (char) 0x1a;


        public void RoundTripTest(string unencoded, string encoded)
        {
            Assert.Equal(encoded, TableKeyEncoding.Encode(unencoded));
            Assert.Equal(unencoded, TableKeyEncoding.Decode(encoded));
        }

        [Fact]
        public void RoundTrips()
        {
            RoundTripTest("!\n", "!!!n");
            RoundTripTest("left" + Unicode0X1A + "right", "left!x1aright");
        }


        // The following characters are not allowed in values for the PartitionKey and RowKey properties:
        // The forward slash(/) character
        // The backslash(\) character
        // The number sign(#) character
        // The question mark (?) character
        // Control characters from U+0000 to U+001F, including:
        // The horizontal tab(\t) character
        // The linefeed(\n) character
        // The carriage return (\r) character
        // Control characters from U+007F to U+009F
        [Fact]
        void EncodesAllForbiddenCharacters()
        {
            List<char> forbiddenCharacters = "\\/#?\t\n\r".ToCharArray().ToList();
            forbiddenCharacters.AddRange(Enumerable.Range(0x00, 1+(0x1f-0x00)).Select(i => (char)i));
            forbiddenCharacters.AddRange(Enumerable.Range(0x7f, 1+(0x9f-0x7f)).Select(i => (char)i));
            string allForbiddenCharacters = String.Join("", forbiddenCharacters);
            string allForbiddenCharactersEncoded = TableKeyEncoding.Encode(allForbiddenCharacters);

            // Make sure decoding is same as encoding
            Assert.Equal(allForbiddenCharacters, TableKeyEncoding.Decode(allForbiddenCharactersEncoded));

            // Ensure encoding does not contain any forbidden characters
            Assert.Equal(0, allForbiddenCharacters.Count( c => allForbiddenCharactersEncoded.Contains(c) ));
        }

    }
}

How about URL encode/decode functions. URL编码/解码功能如何。 It takes care of '/' , '?'它负责'/' , '?' and '#' characters.'#'字符。

string url = "http://www.google.com/search?q=Example";
string key = HttpUtility.UrlEncode(url);
string urlBack = HttpUtility.UrlDecode(key);

see these links http://tools.ietf.org/html/rfc4648#page-7 Code for decoding/encoding a modified base64 URL (see also second answer: https://stackoverflow.com/a/1789179/1094268 )请参阅这些链接http://tools.ietf.org/html/rfc4648#page-7 用于解码/编码修改后的 base64 URL 的代码(另请参阅第二个答案: https : //stackoverflow.com/a/1789179/1094268

I had the problem myself.我自己也有问题。 These are my own functions I use for this now.这些是我现在用于此的自己的功能。 I use the trick in the second answer I mentioned, as well as changing up the + and / which are incompatible with azure keys that may still appear.我在我提到的第二个答案中使用了这个技巧,并更改了与可能仍然出现的天蓝色键不兼容的+/

private static String EncodeSafeBase64(String toEncode)
{
    if (toEncode == null)
        throw new ArgumentNullException("toEncode");
    String base64String = Convert.ToBase64String(Encoding.UTF8.GetBytes(toEncode));
    StringBuilder safe = new StringBuilder();
    foreach (Char c in base64String)
    {
        switch (c)
        {
            case '+':
                safe.Append('-');
                break;
            case '/':
                safe.Append('_');
                break;
            default:
                safe.Append(c);
                break;
        }
    }
    return safe.ToString();
}

private static String DecodeSafeBase64(String toDecode)
{
    if (toDecode == null)
        throw new ArgumentNullException("toDecode");
    StringBuilder deSafe = new StringBuilder();
    foreach (Char c in toDecode)
    {
        switch (c)
        {
            case '-':
                deSafe.Append('+');
                break;
            case '_':
                deSafe.Append('/');
                break;
            default:
                deSafe.Append(c);
                break;
        }
    }
    return Encoding.UTF8.GetString(Convert.FromBase64String(deSafe.ToString()));
}

If it is just the slashes, you can simply replace them on writing to the table with another character, say, '|'如果只是斜线,您可以在写入表格时简单地将它们替换为另一个字符,例如“|” and re-replace them on reading.并在阅读时重新替换它们。

What I have seen is that although alot of non-alphanumeric characters are technically allowed it doesn't really work very well as partition and row key.我所看到的是,尽管技术上允许使用许多非字母数字字符,但它作为分区和行键实际上并不能很好地工作。

I looked at the answears already given here and other places and wrote this: https://github.com/JohanNorberg/AlphaNumeric我看了这里和其他地方已经给出的答案并写了这个: https : //github.com/JohanNorberg/AlphaNumeric

Two alpha-numeric encoders.两个字母数字编码器。

If you need to escape a string that is mostly alphanumeric you can use this:如果你需要转义一个主要是字母数字的字符串,你可以使用这个:

AlphaNumeric.English.Encode(str);

If you need to escape a string that is mostly not alphanumeric you can use this:如果你需要转义一个主要不是字母数字的字符串,你可以使用这个:

AlphaNumeric.Data.EncodeString(str);

Encoding data:编码数据:

var base64 = Convert.ToBase64String(bytes);
var alphaNumericEncodedString = base64
            .Replace("0", "01")
            .Replace("+", "02")
            .Replace("/", "03")
            .Replace("=", "04");

But, if you want to use for example an email adress as a rowkey you would only want to escape the '@' and '.'.但是,如果您想使用例如电子邮件地址作为行键,您只想转义“@”和“.”。 This code will do that:这段代码将做到这一点:

        char[] validChars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ3456789".ToCharArray();
        char[] allChars = rawString.ToCharArray();
        StringBuilder builder = new StringBuilder(rawString.Length * 2);
        for(int i = 0; i < allChars.Length; i++)
        {
            int c = allChars[i];
            if((c >= 51 && c <= 57) || (c >= 65 && c <= 90) || (c >= 97 && c <= 122))
            {
                builder.Append(allChars[i]);
            } 
            else
            {
                int index = builder.Length;
                int count = 0;
                do
                {
                    builder.Append(validChars[c % 59]);
                    c /= 59;
                    count++;
                } while (c > 0);

                if (count == 1) builder.Insert(index, '0');
                else if (count == 2) builder.Insert(index, '1');
                else if (count == 3) builder.Insert(index, '2');
                else throw new Exception("Base59 has invalid count, method must be wrong Count is: " + count);
            }
        }

        return builder.ToString(); 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从 Azure 表存储中的分区键列表进行查询 - How can I query from a list of partition keys in Azure table storage Azure表存储模拟器将SOH字符附加到分区键和行键 - Azure Table Storage Emulator appending SOH character to Partition and Row Keys 您如何在azure表存储中查询多个分区键? - How do you query for multiple partition keys in azure table storage? 给定分区键列表,在表存储中查找匹配的记录 - Given a list of Partition Keys, look up matching records in Table Storage 从表存储中检索一系列分区键的最有效方法 - Most efficient way to retrieve from table storage a range of partition keys Azure表存储(Partition key row key),如何在同一个partition不同rowKey对应插入多个实体? - Azure table storage (Partition key row key), how to insert multiple entities in corresponding to same parttion and different rowKey? Azure表存储:如何仅通过分区键获取单个实体? - Azure table storage: How to get a single entity by a partition key only? 如何使用IQueryable和IAsyncCollector在Azure函数中锁定Azure表分区? - How can I Lock an Azure Table partition in an Azure Function using IQueryable and IAsyncCollector? 在Azure表中编码整数键 - Encoding for integer keys in Azure table 计算 Azure 表存储中分区内的行数 - Count rows within partition in Azure table storage
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM