简体   繁体   English

在.Net 中解压 Comp-3 时遇到问题。 Comp-3 值中除符号字符外还有字母字符

[英]Having trouble unpacking Comp-3 in .Net. There are letter characters aside from sign character inside Comp-3 value

I am trying to import a Mainframe EDI File back to SQL Server using .NET and I am having problems unpacking some comp-3 fields.我正在尝试使用 .NET 将大型机 EDI 文件导入回 SQL 服务器,并且在解压缩某些 comp-3 字段时遇到问题。

This file was from one of our clients and I have the Copy Book layout for the following fields:该文件来自我们的一位客户,我有以下字段的 Copy Book 布局:

05  EH-GROSS-INVOICE-AMT            PIC S9(07)V9999  COMP-3.         
05  EH-CASH-DISCOUNT-AMT            PIC S9(07)V9999  COMP-3.         
05  EH-CASH-DISCOUNT-PCT            PIC S9(03)V9999  COMP-3.

I will just be focusing on these 3 fields as all other fields are PIC(X) and are already Unicode values.我将只关注这 3 个字段,因为所有其他字段都是 PIC(X) 并且已经是 Unicode 值。 I loaded everything up with the help of this Tool Ebcdic2Ascii that was created by Max Vagner.我在 Max Vagner 创建的工具Ebcdic2Ascii的帮助下加载了所有内容。 I just did a bit of modification on the "Unpack" function and have modified it to我只是对“解包”function 做了一些修改,并将其修改为

private string Unpack(byte[] packedBytes, int decimalPlaces, out bool isParsedSuccessfully)
{
    isParsedSuccessfully = true;
    return BitConverter.ToString(packedBytes);
}

in order for me to get the following sample data:为了让我得到以下示例数据:

EH-GROSS-INVOICE-AMT     EH-CASH-DISCOUNT-AMT     EH-CASH-DISCOUNT-PCT
----------------------------------------------------------------------
00-1A-1A-03-26-0C        00-00-00-00-00-0C        00-00-00-0C
00-0A-1A-1A-00-0C        00-00-1A-1A-2D-0C        00-1A-00-0C
00-09-10-20-00-0C        00-00-10-1A-1A-0C        00-1A-00-0C

Here is a sample code that I created for Unpacking these values based on my understanding of Comp-3 values:这是我根据对 Comp-3 值的理解为解包这些值而创建的示例代码:

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            var result1 = UnpackMod("00-1A-1A-03-26-0C", 4);
            var result2 = UnpackMod("00-00-00-00-00-0C", 4);
            var result3 = UnpackMod("00-00-00-0C", 4);

            Console.WriteLine($"{result1}\n{result2}\n{result3}\n");

            var result4 = UnpackMod("00-0A-1A-1A-00-0C", 4);
            var result5 = UnpackMod("00-00-1A-1A-2D-0C", 4);
            var result6 = UnpackMod("00-1A-00-0C", 4);

            Console.WriteLine($"{result4}\n{result5}\n{result6}\n");

            var result7 = UnpackMod("00-09-10-20-00-0C", 4);
            var result8 = UnpackMod("00-00-10-1A-1A-0C", 4);
            var result9 = UnpackMod("00-1A-00-0C", 4);

            Console.WriteLine($"{result7}\n{result8}\n{result9}");

            Console.ReadLine();
        }

        /// <summary>
        /// Method for unpacking Comp-3 fields.
        /// </summary>
        /// <param name="hexString"></param>
        /// <param name="decimalPlaces"></param>
        /// <returns>Returns numeric string if parse was successful; else Return input hex string</returns>
        private static string UnpackMod(string inputString, int decimalPlaces)
        {
            var outputString = inputString;

            // Remove "-".
            outputString = outputString.Replace("-", "");

            // Check last character for sign.
            string lastChar = outputString.Substring(outputString.Length - 1, 1);
            bool isNegative = (lastChar == "D" || lastChar == "B");

            // Remove sign character.
            if (lastChar == "C" || lastChar == "A" || lastChar == "E" || lastChar == "F" || lastChar == "D" || lastChar == "B")
            {
                outputString = outputString.Substring(0, outputString.Length - 1);
            }

            // Place decimal point.
            outputString = outputString.Insert(outputString.Length - decimalPlaces, ".");

            // Check if parsed value is numeric. This will also eliminate all leading 0.
            var isParsedSuccessfully = decimal.TryParse(outputString, out decimal decimalValue);

            // If isParsedSuccessfully is true then return numeric string else return inputString..
            string result = "NULL";
            if (isParsedSuccessfully)
            {
                // Convert value to negative.
                if (isNegative)
                {
                    decimalValue = decimalValue * -1;
                }

                result = decimalValue.ToString();
            }

            return result;
        }
    }
}

After running the sample code I was able to get the following results:运行示例代码后,我能够得到以下结果:

EH-GROSS-INVOICE-AMT     EH-CASH-DISCOUNT-AMT     EH-CASH-DISCOUNT-PCT
----------------------------------------------------------------------
NULL                     0.0000                   0.0000
NULL                     NULL                     NULL
9102.0000                NULL                     NULL        

As you can see I was only able to get following 3 values correctly:如您所见,我只能正确获得以下 3 个值:

00-09-10-20-00-0C -> 9102.0000
00-00-00-00-00-0C -> 0.0000
00-00-00-0C       -> 0.0000

As referenced from this source: http://www.3480-3590-data-conversion.com/article-packed-fields.html .从此来源引用: http://www.3480-3590-data-conversion.com/article-packed-fields.html I have the following understanding about Comp-3:我对 Comp-3 的理解如下:

COBOL Comp-3 is a binary field type that puts ("packs") two digits into each byte, using a notation called Binary Coded Decimal, or BCD. COBOL Comp-3 是一种二进制字段类型,它使用称为二进制编码十进制或 BCD 的表示法将(“打包”)两个数字放入每个字节中。

The Binary Coded Decimal (BCD) data type is just as its name suggests -- it is a value stored in decimal (base ten) notation, and each digit is binary coded.二进制编码十进制 (BCD) 数据类型正如其名称所暗示的那样 - 它是以十进制(以十为基数)表示法存储的值,并且每个数字都是二进制编码的。 Since a digit only has ten possible values (0-9).因为一个数字只有十个可能的值(0-9)。

The low nibble of the least significant byte is used to store the sign for the number.最低有效字节的低半字节用于存储数字的符号。 This nibble stores only the sign, not a digit.这个半字节只存储符号,而不是数字。 "C" hex is positive, "D" hex is negative, and "F" hex is unsigned. “C”十六进制为正,“D”十六进制为负,“F”十六进制无符号。

Since I know that BCD should only be values 0-9 and that there should just only be a character at the end which could either be "C", "D" or "F".因为我知道 BCD 应该只有 0-9 的值,并且最后应该只有一个字符,可以是“C”、“D”或“F”。 I don't know how to unpack the following values:我不知道如何解压缩以下值:

00-1A-1A-03-26-0C
00-0A-1A-1A-00-0C        
00-00-1A-1A-2D-0C
00-1A-00-0C
00-00-10-1A-1A-0C
00-1A-00-0C

These values has other characters beside the sign character.这些值除了符号字符之外还有其他字符。 I have a feeling that the data has already been converted because if it is not then there should be no readable values there not unless you apply an Encoding.我有一种感觉,数据已经被转换,因为如果不是,那么除非你应用编码,否则应该没有可读的值。 I am still not sure about this and would love any insights on this.我仍然不确定这一点,并希望对此有任何见解。 Thanks.谢谢。

First, PIC X is not Unicode in COBOL.首先, PIC X不是 COBOL 中的 Unicode。

Quoting myself from here ...这里引用自己...

It is common for mainframe data to include both text and binary data in a single record, for example a name, a currency amount, and a quantity:大型机数据通常在单个记录中同时包含文本和二进制数据,例如名称、货币金额和数量:

Hopper Grace ar%.

...which would be... ...这将是...

x'C8969797859940404040C799818385404040404081996C004B'

...in hex. ...十六进制。 This is code page 37, commonly referred to as EBCDIC.这是代码页 37,通常称为 EBCDIC。

[...]Converting to code page 1250, commonly in use on Microsoft Windows, you would end up with... [...]转换为 Microsoft Windows 上常用的代码页 1250,您最终会得到...

x'486F707065722020202047726163652020202020617225002E'

...where the text data is translated but the packed data is destroyed. ...文本数据被翻译但打包数据被破坏的地方。 The packed data no longer has a valid sign in the last nibble (the lower half of the last byte), the currency amount itself has been changed as has the quantity (from decimal 75 to decimal 11,776 due to both code page conversion and mangling of a big endian number as a little endian number).打包数据在最后一个半字节(最后一个字节的下半部分)中不再具有有效符号,货币金额本身已更改为数量(从十进制 75 到十进制 11,776,由于代码页转换和修改)大端数作为小端数)。

Likely your data was code page converted on transfer from the mainframe.您的数据可能是在从大型机传输时转换的代码页。 If you know the original code page and the code page it was converted to, then you might be able to unscramble the packed data.如果您知道原始代码页及其转换为的代码页,那么您可能能够解读打包数据。

I say might because, if you're lucky, the hex values you have will have been mapped one-to-one with hex values in the original code page.我说可能是因为,如果幸运的话,您拥有的十六进制值将与原始代码页中的十六进制值一一对应。 Note that it is common for both EBCDIC x'15' and x'0D' to be mapped to ASCII x'0D'.请注意,EBCDIC x'15' 和 x'0D' 都映射到 ASCII x'0D' 是很常见的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM