简体   繁体   English

WSO2 DAS 火花脚本

[英]WSO2 DAS spark script

I'm trying to deploy new data publisher car.我正在尝试部署新的数据发布者汽车。 I looked at tthe APIM_LAST_ACCESS_TIME_SCRIPT.xml spark script (used by api manager) and didn't understand the difference between the two temporaries tables created: API_LAST_ACCESS_TIME_SUMMARY_FINAL and APILastAccessSummaryData我查看了 APIM_LAST_ACCESS_TIME_SCRIPT.xml spark 脚本(由 api manager 使用)并且不明白创建的两个临时表之间的区别: API_LAST_ACCESS_TIME_SUMMARY_FINALAPILastAccessSummaryData

The two Spark temporary tables represent different JDBC tables (possibly in different datasources), where one of them acts as the source for Spark and the other acts as the destination.这两个 Spark 临时表代表不同的 JDBC 表(可能在不同的数据源中),其中一个充当 Spark 的源,另一个充当目的地。

To illustrate this better, have a look at the simplified script in question:为了更好地说明这一点,请查看有问题的简化脚本:

create temporary table APILastAccessSummaryData using CarbonJDBC options (dataSource "WSO2AM_STATS_DB", tableName "API_LAST_ACCESS_TIME_SUMMARY", ... );

CREATE TEMPORARY TABLE API_LAST_ACCESS_TIME_SUMMARY_FINAL USING CarbonAnalytics OPTIONS (tableName "API_LAST_ACCESS_TIME_SUMMARY", ... );

INSERT INTO TABLE APILastAccessSummaryData select ... from API_LAST_ACCESS_TIME_SUMMARY_FINAL;

As you can see, we're first creating a temporary table in Spark with the name APILastAccessSummaryData , which represents an actual relational DB table with the name API_LAST_ACCESS_TIME_SUMMARY in the WSO2AM_STATS_DB datasource.如您所见,我们首先在 Spark 中创建一个名为APILastAccessSummaryData的临时表,它代表WSO2AM_STATS_DB数据源中名为API_LAST_ACCESS_TIME_SUMMARY的实际关系数据库表。 Note the using CarbonJDBC keyword, which can be used to directly map JDBC tables within Spark.请注意using CarbonJDBC关键字,该关键字可用于在 Spark 中直接映射 JDBC 表。 Such tables (and their rows) are not encoded, and can be read by the user.此类表(及其行)未编码,用户可以读取。

Second, we're creating another Spark temporary table with the name API_LAST_ACCESS_TIME_SUMMARY_FINAL .其次,我们正在创建另一个名为API_LAST_ACCESS_TIME_SUMMARY_FINAL Spark 临时表。 Here however, we're using the CarbonAnalytics analytics provider, which will mean that this table will not be a vanilla JDBC table, but an encoded table similar to the one from your previous question .然而,在这里,我们使用CarbonAnalytics分析提供程序,这意味着该表将不是普通的 JDBC 表,而是类似于您上一个问题中的表的编码表。

Now, from the third statement, you can see that we're reading (SELECT) a number of fields from the second table API_LAST_ACCESS_TIME_SUMMARY_FINAL and inserting them (INSERT INTO) into the first, which is APILastAccessSummaryData .现在,从第三个语句中,您可以看到我们正在从第二个表API_LAST_ACCESS_TIME_SUMMARY_FINAL读取 (SELECT) 多个字段并将它们 (INSERT INTO) 插入到第一个,即APILastAccessSummaryData This represents the Spark summarisation process.这代表了 Spark 总结过程。

For more details on the differences between the CarbonAnalytics and CarbonJDBC analytics providers or on how Spark handles such tables in general, have a look at the documentation page for Spark Query Language .有关CarbonAnalyticsCarbonJDBC分析提供程序之间的差异或 Spark 如何处理此类表的更多详细信息,请查看Spark 查询语言的文档页面。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM