如何以编程方式从 jaql 的头文件中读取模式？

Question

I am trying to achieve the following in JAQL and am stuck.我试图在 JAQL 中实现以下目标，但被卡住了。

I have two files: File data.tsv, which contains tab separated data, and a file header.tsv, which contains exactly one line with tab separated values, corresponding to the "header" of file data.tsv.我有两个文件：文件 data.tsv，它包含制表符分隔的数据，以及一个文件 header.tsv，它只包含一行带有制表符分隔值的行，对应于文件 data.tsv 的“标题”。

What I want to achieve is to read data.tsv using:我想要实现的是使用以下方法读取 data.tsv：

read(lines(location='data.tsv')) -> transform catch(delToJson($, {"schema": schema_json, "delimiter": "\t"}), {"errThresh":99999999999},$);

For this I need schema_json, a schema definition.为此，我需要 schema_json，一个架构定义。 I'd like to create this schema_json from file header.tsv (and assigning every field the type "string").我想从文件 header.tsv 创建这个 schema_json （并为每个字段分配类型“字符串”）。

Reading header.tsv is straight forward, and putting it into a record of type header_record = {"header1": string, "header2":string, ....} as well.读取 header.tsv 很简单，并将其放入header_record = {"header1": string, "header2":string, ....}类型的记录中。 However how do I transform the jaql record header_record to an object of type schema: schema_json = schema {"header1":string,"header2":string, ....} ?但是，如何将 jaql记录header_record 转换为schema类型的对象： schema_json = schema {"header1":string,"header2":string, ....} ？

Answer 1

OK, here is a very dirty workaround, that nevertheless does the trick.好的，这是一个非常肮脏的解决方法，但仍然可以解决问题。 I am still waiting for IBM support to get back to me with "the canonical way" (although I doubt this exists):我仍在等待 IBM 支持以“规范方式”回复我（尽管我怀疑这是否存在）：

First, define path of the header file首先定义头文件的路径

HeaderFilePath = '/data/column_headers.tsv';

Then read the header file.然后读取头文件。 Output is an array.输出是一个数组。

HeaderFile = localRead(del(location=HeaderFilePath, delimiter = "\t"));

Now I construct two arrays of the same length as the HeaderFile array, in order to use them with arrayToRecord in the next step.现在我构造了两个与 HeaderFile 数组长度相同的数组，以便在下一步arrayToRecord它们与arrayToRecord一起使用。 Why I construct two and not just one will be apparent later.为什么我构建两个而不是一个，稍后会很明显。

val_array = HeaderFile -> expand -> transform 'some string';
val_array2 = HeaderFile -> expand -> transform 'some other string';

The idea is to build an artificial record schema_record with the same schema as the data and then to get the schema via schemaof , which then can be used as schema input for reading the data file.这个想法是构建一个与数据具有相同模式的人工记录 schema_record，然后通过schemaof获取模式，然后可以将其用作模式输入以读取数据文件。 For this one can use为此可以使用

schema_record = arrayToRecord(HeaderFile -> expand,val_array)

Problems:问题：

a) schemaof(schema_record) returns schema { * }? a) schemaof(schema_record)返回schema { * }? . . This is because schemas can (seemingly) only be inferred from materialized data, ie one has to use schema_record := arrayToRecord(HeaderFile -> expand,val_array) .这是因为模式可以（似乎）只能从物化数据中推断出来，即必须使用schema_record := arrayToRecord(HeaderFile -> expand,val_array) 。

b) Now, using schemaof(schema_record) returns a schema. b) 现在，使用schemaof(schema_record)返回一个模式。 Which is good.哪个好。 However, I don't understand why a schema function would do something like this, but the schema record looks something like "header1": @{const: "some string", fixed: 11} string instead of the expected "header1": string .但是，我不明白为什么模式函数会做这样的事情，但模式记录看起来像"header1": @{const: "some string", fixed: 11} string而不是预期的"header1": string 。 Hence this "schema" is pretty much useless.因此，这个“模式”几乎没有用。 What is worse, there seems to be no way to manipulate that schema object, such that one might be able to remove the @{} specifications.更糟糕的是，似乎没有办法操纵该架构对象，以至于人们可能能够删除@{}规范。

Workaround: use function elementsOf , which returns the schema of elements of an array of schemas.解决方法：使用函数elementsOf ，它返回架构数组的元素架构。 Meaning:意义：

elementsOf([schemaof({a:1,b:3}),{a:1,b:3}]); 
>> schema {"a":@{const: 1, fixed: 1} long, "b":@{const: 3, fixed: 1} long}.

However, using schemas with different "const" and "fixed" records will force elementsOf to fall back to a "raw" schema (without @{})但是，使用具有不同“const”和“fixed”记录的模式将强制elementsOf回退到“原始”模式（没有 @{}）

elementsOf([schemaof({a:1,b:3}),{a:45,b:32}])
>> schema {"a": long, "b": long}.

This is the "dirty workaround" that I use to achieve what I want.这是我用来实现我想要的“肮脏的解决方法”。 (And all this is due to a very strange understanding of what a schema is...) （所有这一切都是由于对模式是什么的非常奇怪的理解......）

schema_array := [arrayToRecord(HeaderFile -> expand, val_array),arrayToRecord(HeaderFile -> expand, val_array2)];

DataSchema := elementsOf(schemaof(schema_array));

Data = read(lines(location='/data/data.tsv')) -> transform catch(delToJson($,
{"schema": DataSchema, "delimiter": "\t"}), {"errThresh": 99999999999},$);

如何以编程方式从 jaql 的头文件中读取模式？

问题描述

1 个解决方案

解决方案1
1 2015-07-01 19:43:29

如何以编程方式从 jaql 的头文件中读取模式？

问题描述

1 个解决方案

解决方案1 1 2015-07-01 19:43:29

解决方案1
1 2015-07-01 19:43:29