简体   繁体   English

AWS 红移迁移

[英]AWS Redshift Migration

So I learned how to code in SQL about 2 months ago, so I'm still pretty new and still learning different commands/functions each day.所以我在大约 2 个月前学会了如何在 SQL 中编码,所以我仍然很新,每天仍在学习不同的命令/功能。 I have been tasked with migrating some queries from Teradata to Redshift and there are obviously some differing syntax.我的任务是将一些查询从 Teradata 迁移到 Redshift,显然存在一些不同的语法。 Now I have been able to replace most of them, but I am stuck on a command "SYS_CALENDAR".现在我已经能够替换其中的大部分,但我被困在命令“SYS_CALENDAR”上。 Can someone explain to me how SYS_CALENDAR works so I could potentially hard code it or does anyone know any suitable replacements that run within AWS Redshift?有人可以向我解释 SYS_CALENDAR 是如何工作的,这样我就可以对其进行硬编码,或者有人知道在 AWS Redshift 中运行的任何合适的替代品吗?

Thanks谢谢

As someone who has ported a large Teradata solution to Redshift let me say good luck.作为将大型 Teradata 解决方案移植到 Redshift 的人,让我说好运吧。 These are very different systems and porting the SQL to achieve functional equivalence is only the first challenge.这些是非常不同的系统,移植 SQL 以实现功能等效只是第一个挑战。 I'm happy to have an exchange on what these challenges will likely be if you like but first off your question.如果您愿意,我很高兴就这些挑战可能是什么进行交流,但首先是您的问题。

SYS_CALENDAR in Teradata is a system view that can be used like a normal view that holds information about every date. Teradata 中的 SYS_CALENDAR 是一个系统视图,可以像保存每个日期信息的普通视图一样使用。 This can be queried or joined as needed to get, for example, the day-of-week or week-of-year information about a date.这可以根据需要进行查询或连接,以获取例如有关日期的星期几或一年中的星期几信息。 It really performs a date calculation function base on OS information but is used like a view.它确实根据操作系统信息执行日期计算 function 但用作视图。

No equivalent view exists in Redshift and this creates some porting difficulties. Redshift 中不存在等效视图,这造成了一些移植困难。 Many create "DATES" tables in Redshift to hold the information they need for dates across some range and there are web pages on making such a table (ex. https://elliotchance.medium.com/building-a-date-dimension-table-in-redshift-6474a7130658 ).许多人在 Redshift 中创建“DATES”表来保存他们在某个范围内的日期所需的信息,并且有 web 页面用于制作此类表(例如https://elliotchance.medium.com/building-a-date-dimension-表-in-redshift-6474a7130658 )。 Just pre-calculate all the date information you need for the range of dates in your database and you can swap this into queries when porting.只需预先计算数据库中日期范围所需的所有日期信息,即可在移植时将其交换为查询。 This is the simplest route to take for porting and is the one that many choose (sometimes wrongly).这是最简单的移植途径,也是许多人选择的途径(有时是错误的)。

The issue with this route is that a user supported DATES table is often a time bomb waiting to go off and technical debt for the solution.这条路线的问题是,用户支持的 DATES 表通常是一个定时炸弹,等待 go 关闭和解决方案的技术债务。 This table only has the dates you specify at creation and the range of dates often expands over time.此表仅包含您在创建时指定的日期,并且日期范围通常会随着时间的推移而扩大。 When it is used with a date that isn't in the DATES table wrong answers are created, data is corrupted, and it is usually silent.当它与不在 DATES 表中的日期一起使用时,会创建错误的答案,数据已损坏,并且通常是无声的。 Not good.不好。 Some create processes to expand the date range but again this is based on some "expectation" of how the table will be used.一些创建流程来扩大日期范围,但这又是基于对如何使用表格的一些“预期”。 It is also a real table with ever expanding data that is frequently used causing potential query performance issues and isn't really needed - a performance tax for all time.它也是一个真实的表,其中包含经常使用的不断扩展的数据,这些数据经常导致潜在的查询性能问题,并且并不是真正需要的——一直以来的性能税。

The better long-term answer is to use the native Redshift (Postgres) date functions to operate on the dates as you need.更好的长期答案是使用本机 Redshift (Postgres) 日期函数根据需要对日期进行操作。 Doing this uses the OS's understanding of dates (without bound) and does what Teradata does with the system view (calculate the needed information).这样做会使用操作系统对日期的理解(无限制),并执行 Teradata 对系统视图所做的事情(计算所需的信息)。 For example you can get the work-week of a date by using the DATE_PART() function instead of joining with the SYS_CALENDAR view.例如,您可以通过使用 DATE_PART() function 而不是加入 SYS_CALENDAR 视图来获取日期的工作周。 This approach doesn't have the downsides of the DATES table but does come with porting cost.这种方法没有 DATES 表的缺点,但会带来移植成本。 The structure of queries need to change (remove joins and add functions) which takes more work and requires understanding of the original query.查询的结构需要更改(删除连接和添加函数),这需要更多的工作并且需要了解原始查询。 Unfortunately time, work, and understanding are things that are often in short supply when porting databases which is why the DATES table approach is often seen and lives forever as technical debt.不幸的是,在移植数据库时,时间、工作和理解往往是供不应求的,这就是为什么 DATES 表方法经常被视为技术债务并永远存在的原因。

I assume that this port is large in nature and if so my recommendation is this - lay out these trade offs for the stakeholders.我假设这个端口本质上很大,如果是这样,我的建议是 - 为利益相关者制定这些权衡。 If they cannot absorb the time to convert the queries (likely) propose the DATES table approach but have the technical debt clearly documented along with the "end date" at which functionality will break.如果他们不能花时间转换查询(可能)建议使用 DATES 表方法,但要清楚地记录技术债务以及功能将中断的“结束日期”。 I'd pick a somewhat close date, like 2025, so that some action will need to be on the long-term plans.我会选择一个比较接近的日期,比如 2025 年,这样就需要对长期计划采取一些行动。 Have triggers documented as to when action is needed.记录何时需要采取行动的触发器。

This will not be the first of these "technical debt" issues that come up in a port such as this.这不会是像这样的港口出现的这些“技术债务”问题中的第一个。 There are too many places where "get it done" will trump "do it right".有太多地方“把它做好”会胜过“把它做好”。 You haven't even scratch the surface on performance issues - these are very different databases and data solutions tuned, over time, for Teradata will not perform optimally on Redshift based on a simple port.您甚至还没有触及性能问题的表面——随着时间的推移,这些是非常不同的数据库和数据解决方案,因为 Teradata 无法在基于简单端口的 Redshift 上实现最佳性能。 This isn't an "all is lost" level issue.这不是“全部丢失”级别的问题。 Just get the choices documented along with the long-term implications of those choices.只需记录选择以及这些选择的长期影响。 Have triggers (dates or performance measures) defined for when aspects of the "port" will need to be followed up with an "optimization" effort.为何时需要跟进“优化”工作的“端口”的各个方面定义触发器(日期或性能度量)。 Management likes to forget about the need for follow-up on these efforts so get these documented.管理层喜欢忘记跟进这些工作的必要性,因此将这些记录在案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM