如何定义 ebcdic 文件的记录结构？

Question

I have ebcdic file in hdfs I want to load data to spark dataframe, process it and load results as orc files, I found that there is a open source solution which is cobrix cobrix , that allow to get data from ebcdic files, but developer must provide a copybook file which is a schema definition.我在 hdfs 中有 ebcdic 文件我想加载数据以激发 dataframe，处理它并将结果加载为 orc 文件，我发现有一个开源解决方案 cobrix cobrix ，它允许从 ebcdic 文件中获取数据，但开发人员必须提供一个副本文件，它是一个模式定义。

A few line of my ebcedic file are presented in the attached image.附图中显示了我的 ebcedic 文件的几行。 I want to get the format of copybook of the ebcdic file, essentially I want to read the vin his length is 17, vin_data the length is 3 and finally vin_val the length is 100.我想获取 ebcdic 文件的字帖格式，本质上我想读取长度为 17 的vin ，长度为 3 的vin_data ，最后长度为 100 的vin_val 。

Answer 1

how to define a copybook file of ebcdic data?如何定义一个 ebcdic 数据的 copybook 文件？

You don't.你不知道。

A copybook may be used as a record definition (= how the data is stored), it has nothing to do with the encoding of data that may be stored in that. copybook 可以用作记录定义（=数据的存储方式），它与可能存储在其中的数据编码无关。

This leaves the question "How do I define the record structure?"这就留下了问题“我如何定义记录结构？”

You'd need the amount of fields, their length and type (it likely is not only USAGE DISPLAY ) and then just define it with some fancy names.您需要字段的数量、它们的长度和类型（可能不仅仅是USAGE DISPLAY ），然后用一些奇特的名称来定义它。 Ideally you just get the original record definition from the COBOL program writing the file, put that into a copybook if it isn't in one yet, and use that.理想情况下，您只需从编写文件的 COBOL 程序中获取原始记录定义，将其放入副本（如果还没有），然后使用它。

Your link has samples that show actually how a copybook looks like, if you struggle on the definition then please edit your question with the copybook you've defined and we may be able to help.您的链接包含实际显示抄写本外观的示例，如果您在定义上遇到困难，请使用您定义的抄写本编辑您的问题，我们可能会提供帮助。

Answer 2

Based on your comment in the question, and looking at the input file, you could start with this.根据您对问题的评论，并查看输入文件，您可以从此开始。

01  VIN-RECORD.
    05  VIN                 PIC X(17).
    05  VIN-COUNT           PIC S9(5) COMP-3.
    05  VIN-VALUE           PIC X(100).

I'm guessing that the second field is COMP-3 based on the six examples all ending with a C byte.我猜测第二个字段是基于六个示例的 COMP-3，所有示例都以 C 字节结尾。 This indicates a positive COMP-3 value.这表示正 COMP-3 值。 AD byte would be a negative COMP-3 value. AD 字节将是一个负的 COMP-3 值。 An F byte would indicate an unsigned COMP-3 value. F 字节表示一个无符号的 COMP-3 值。

The third field is variable length and right padded with spaces.第三个字段是可变长度的，右边用空格填充。

如何定义 ebcdic 文件的记录结构？

问题描述

2 个解决方案

解决方案1
2 2020-09-21 10:46:49

解决方案2
1 已采纳 2020-09-21 11:16:03

如何定义 ebcdic 文件的记录结构？

问题描述

2 个解决方案

解决方案1 2 2020-09-21 10:46:49

解决方案2 1 已采纳 2020-09-21 11:16:03

解决方案1
2 2020-09-21 10:46:49

解决方案2
1 已采纳 2020-09-21 11:16:03