简体   繁体   English

将VB6编码应用程序转换为C#

[英]Converting VB6 encoding application into C#

I'm importing files in codepage 1252 encoding to a SQL Server 2008 database. 我正在将codepage 1252编码的文件导入SQL Server 2008数据库。

Some data contains a comma that isn't the traditional comma ( keycode 44 ) but instead 8218 . 一些数据包含的逗号不是传统的逗号( keycode 44 ),而是8218

The column that contains this value is encrypted via an algorithm in VB6. 包含此值的列通过VB6中的算法加密。 When I implement the same algorithm in C# I get value 130 which then will does not match 8218 . 当我在C#中实现相同的算法时,我得到的值130将与8218不匹配。

What am I missing? 我想念什么?

EDIT Thought I would share the solution.... Thank god for Reflector. 编辑以为我会分享解决方案...。感谢上帝为Reflector。 It was that simple... 就这么简单...

130 is the windows-1252 encoding for the character U+201A (decimal 8218), "Single Low-9 Quotation Mark". 130是字符U+201A (十进制8218),“单个低9引号”的windows-1252编码。 If you decode it correctly, the resulting char will have the numeric value 8218 because .NET uses UTF-16 ("Unicode") internally. 如果正确解码,则生成的char将具有数值8218,因为.NET在内部使用UTF-16(“ Unicode”)。

It sounds like you decoded the windows-1252 byte sequence as ISO-8859-1, which maps 0x82 (decimal 130) to a control character with numeric value 130. If that's the case, the real solution to your problem is to go back and change the part that's decoding it wrong in the first place. 听起来您好像将Windows-1252字节序列解码为ISO-8859-1,将0x82 (十进制130)映射到数字值为130的控制字符。如果是这种情况,真正的解决方案是返回并首先更改将其解码错误的部分。

As ever, the key thing is to separate out each bit of the process, and check the strings at each stage. 与以往一样,关键是要分离出过程的每一部分,并在每个阶段检查字符串。

So first write a program which just reads the file and dumps out the details of the strings, in terms of the Unicode values. 因此,首先编写一个程序,该程序仅读取文件并根据Unicode值转储字符串的详细信息。 I have some code on my strings page which will help with this. 我的字符串页面上有一些代码可以对此有所帮助。 When you read the file, specify the encoding explicitly. 读取文件时,请明确指定编码。

Then write a separate program with hardcoded literals (using \\uxxxx where necessary) to upload into the database. 然后编写一个带有硬编码文字的单独程序(必要时使用\\uxxxx )以上载到数据库中。 Then examine the strings in the database as accurately as you can. 然后,尽可能准确地检查数据库中的字符串。 I would expect the actual uploading bit to just work, so long as the database has the appropriate settings. 我希望实际的上传位能够正常工作,只要数据库具有适当的设置即可。

There's a bit more on this general process on my "debugging unicode problems" page. 我的“调试unicode问题”页面上有关于此常规过程的更多内容。

After fiddling a bit I came up with this: 经过一番摆弄之后,我想到了这个:

/// <summary>
/// Some charcodes produced by unicode character handling
/// does not map correctly to codepage 1252. This function
/// translates every char to codepage 1252, unless the char
/// takes more than one byte. Then it gets encoded using Unicode.
/// </summary>
/// <param name="chars"></param>
/// <returns></returns>
private string GetStringAfterFixingEncoding(IEnumerable<char> chars)
{
    var result = new StringBuilder();

    foreach (var c in chars)
    {
        var unicodeBytesForChar = Encoding.Unicode.GetBytes(new[] { c });

        if (unicodeBytesForChar.Length > 1 && unicodeBytesForChar[1] != 0)
            result.Append(Encoding.Unicode.GetChars(unicodeBytesForChar)[0]);
        else
            result.Append(_encoding.GetChars(unicodeBytesForChar)[0]);
    }

    return result.ToString();
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM