简体   繁体   English

Flink Schema 与 Table Schema

[英]Flink Schema vs Table Schema

I am using Flink SQL API and I am a bit lost between all the 'schema' types: TableSchema , Schema (from org.apache.flink.table.descriptors.Schema ) and TypeInformation .我正在使用 Flink SQL API 我有点迷失在所有“模式”类型之间: TableSchemaSchema (来自org.apache.flink.table.descriptors.SchemaTypeInformation Schema )。

A TableSchema can be created from a TypeInformation , a TypeInformation can be created from a TableSchema and a Schema can be created from a TableSchema可以从TypeInformation创建TableSchema ,可以从TableSchema创建TypeInformation ,可以从TableSchema创建Schema

But it looks like a Schema cannot be converted back to TypeInformation or TableSchema (?)但看起来Schema无法转换回TypeInformationTableSchema (?)

Why is there 3 different type of objects to store the same kind of information?为什么有 3 种不同类型的对象来存储同一种信息?

For example, let's say that I have a string Schema coming from an Avro schema file, and that I want to add a new field to it.例如,假设我有一个来自 Avro 模式文件的字符串模式,并且我想向它添加一个新字段。 To do so, the only solution I have found is:为此,我找到的唯一解决方案是:

String mySchemaRaw = ...;
TypeInformation<Row> typeInfo = AvroSchemaConverter.convertToTypeInfo(mySchemaRaw);
Schema newSchema = new Schema().schema(TableSchema.fromTypeInfo(typeInfo));
newSchema = newSchema.field("nexField",...);


// Need the newSchema as a TableSchema 

Is this the normal way to use these objects?这是使用这些对象的正常方式吗? (looks weird to me) (我觉得很奇怪)

TypeInformation and TableSchema solve different things. TypeInformationTableSchema解决不同的事情。 TypeInformation is physical information how to ship a record class (eg a row or a POJO) from one operator to the other. TypeInformation是物理信息,如何将记录 class(例如,一行或 POJO)从一个操作员发送到另一个操作员。

TableSchema describes the schema of a table independent of the underlying per-record type. TableSchema描述了独立于底层每记录类型的表的模式。 It is similar to the schema part of a CREATE TABLE name (a INT, b BIGINT) DDL statement.它类似于CREATE TABLE name (a INT, b BIGINT) DDL 语句的模式部分。 In SQL one also doesn't define a table like CREATE TABLE name ROW(a INT, B BIGINT) .在 SQL 中,也没有定义像CREATE TABLE name ROW(a INT, B BIGINT)这样的表。 But it is true that schema and row type are related which is why converter methods are provided.但确实模式和行类型是相关的,这就是提供转换器方法的原因。 The differences become bigger once concepts like PRIMARY KEY etc. are introduced.一旦引入了PRIMARY KEY等概念,差异就会变得更大。

Schema is the current way of specifying non-SQL concepts such as time attributes and field mappings. Schema是指定时间属性和字段映射等非 SQL 概念的当前方式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM