[英]Flink Schema vs Table Schema
I am using Flink SQL API and I am a bit lost between all the 'schema' types: TableSchema
, Schema
(from org.apache.flink.table.descriptors.Schema
) and TypeInformation
.我正在使用 Flink SQL API 我有点迷失在所有“模式”类型之间:
TableSchema
, Schema
(来自org.apache.flink.table.descriptors.Schema
和TypeInformation
Schema )。
A TableSchema
can be created from a TypeInformation
, a TypeInformation
can be created from a TableSchema
and a Schema
can be created from a TableSchema
可以从
TypeInformation
创建TableSchema
,可以从TableSchema
创建TypeInformation
,可以从TableSchema
创建Schema
But it looks like a Schema
cannot be converted back to TypeInformation
or TableSchema
(?)但看起来
Schema
无法转换回TypeInformation
或TableSchema
(?)
Why is there 3 different type of objects to store the same kind of information?为什么有 3 种不同类型的对象来存储同一种信息?
For example, let's say that I have a string Schema coming from an Avro schema file, and that I want to add a new field to it.例如,假设我有一个来自 Avro 模式文件的字符串模式,并且我想向它添加一个新字段。 To do so, the only solution I have found is:
为此,我找到的唯一解决方案是:
String mySchemaRaw = ...;
TypeInformation<Row> typeInfo = AvroSchemaConverter.convertToTypeInfo(mySchemaRaw);
Schema newSchema = new Schema().schema(TableSchema.fromTypeInfo(typeInfo));
newSchema = newSchema.field("nexField",...);
// Need the newSchema as a TableSchema
Is this the normal way to use these objects?这是使用这些对象的正常方式吗? (looks weird to me)
(我觉得很奇怪)
TypeInformation
and TableSchema
solve different things. TypeInformation
和TableSchema
解决不同的事情。 TypeInformation
is physical information how to ship a record class (eg a row or a POJO) from one operator to the other. TypeInformation
是物理信息,如何将记录 class(例如,一行或 POJO)从一个操作员发送到另一个操作员。
TableSchema
describes the schema of a table independent of the underlying per-record type. TableSchema
描述了独立于底层每记录类型的表的模式。 It is similar to the schema part of a CREATE TABLE name (a INT, b BIGINT)
DDL statement.它类似于
CREATE TABLE name (a INT, b BIGINT)
DDL 语句的模式部分。 In SQL one also doesn't define a table like CREATE TABLE name ROW(a INT, B BIGINT)
.在 SQL 中,也没有定义像
CREATE TABLE name ROW(a INT, B BIGINT)
这样的表。 But it is true that schema and row type are related which is why converter methods are provided.但确实模式和行类型是相关的,这就是提供转换器方法的原因。 The differences become bigger once concepts like
PRIMARY KEY
etc. are introduced.一旦引入了
PRIMARY KEY
等概念,差异就会变得更大。
Schema
is the current way of specifying non-SQL concepts such as time attributes and field mappings. Schema
是指定时间属性和字段映射等非 SQL 概念的当前方式。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.