简体   繁体   English

在Teradata中为具有一对一关系的表插入策略

[英]Insert strategy for tables with one-to-one relationships in Teradata

In our data model, which is derived from the Teradata industry models, we observe a common pattern, where the superclass and subclass relationships in the logical data model are transformed into one-to-one relationships between the parent and the child table. 在从Teradata行业模型派生的数据模型中,我们观察到一种通用模式,其中逻辑数据模型中的超类和子类关系被转换为父表与子表之间的一对一关系。

I know you can roll-up or roll-down the attributes to end up with a single table but we are not using this option overall. 我知道您可以向上或向下滚动属性以最终得到一个表,但我们并未整体使用此选项。 At the end what we have is a model like this: 最后,我们有一个像这样的模型:

在此处输入图片说明

Where City Id references a Geographical Area Id. 城市ID引用地理区域ID的位置。

I am struggling with a good strategy to load the records in these tables. 我正在努力制定一个好的策略来将记录加载到这些表中。

Option 1: I could select the max(Geographical Area Id) and calculate the next Ids for a batch insert and reuse them for the City Table. 选项1:我可以选择max(地理区域ID),然后为批量插入计算下一个ID,然后将其重新用于City Table。

Option 2: I could use an Identity column in the Geographical Area Table and retrieve it after I insert every record in order to use it for the City table. 选项2:我可以在“地理区域表”中使用“身份”列,并在插入每条记录后检索它,以便将其用于“城市”表。

Any other options? 还有其他选择吗?

I need to assess the solution in terms of performance, reliability and maintenance. 我需要从性能,可靠性和维护方面评估解决方案。

Any comment will be appreciated. 任何意见将不胜感激。

Kind regards, 亲切的问候,

Paul 保罗

When you say "load the records into these tables", are you talking about a one-time data migration or a function that creates records for new Geographical Area/City? 当您说“将记录加载到这些表中”时,您是在谈论一次性数据迁移还是为新的地理区域/城市创建记录的功能?

If you are looking for a surrogate key and are OK with gaps in your ID values, then use an IDENTITY column and specify the NO CYCLE clause, so it doesn't repeat any numbers. 如果您正在寻找一个替代键,并且ID值之间有空格,则可以使用IDENTITY列并指定NO CYCLE子句,因此它不会重复任何数字。 Then just pass NULL for the value and let TD handle it. 然后,只需为该值传递NULL并让TD处理即可。

If you do need sequential IDs, then you can just maintain a separate "NextId" table and use that to generate ID values. 如果确实需要顺序ID,则只需维护一个单独的“ NextId”表,然后使用该表生成ID值。 This is the most flexible way and would make it easier for you to manage your BATCH operations. 这是最灵活的方法,可以使您更轻松地管理BATCH操作。 It requires more code/maintenance on your part, but is more efficient than doing a MAX() + 1 on your data table to get your next ID value. 它一方面需要更多代码/维护,但比在数据表上执行MAX()+ 1来获取下一个ID值更有效。 Here's the basic idea: 这是基本思想:

BEGIN TRANSACTION 开始交易

  • Get the "next" ID from a lookup table 从查找表中获取“下一个” ID
  • Use that value to generate new ID values for your next record(s) 使用该值为您的下一条记录生成新的ID值
  • Create your new records 创建新记录
  • Update the "next" ID value in the lookup table and increment it by the # rows newly inserted (you can capture this by storing the value in the ACTIVITY_COUNT value variable directly after executing your INSERT/MERGE statement) 更新查找表中的“下一个” ID值并将其增加新插入的#行(您可以通过在执行INSERT / MERGE语句后直接将其存储在ACTIVITY_COUNT值变量中来捕获该值)
  • Make sure to LOCK the lookup table at the beginning of your transaction so it can't be modified until your transaction completes 确保在交易开始时锁定查找表,以便在交易完成之前无法对其进行修改

END TRANSACTION 结束交易

Here is an example from Postgres, that you can adapt to TD: 这是Postgres的示例,您可以适应TD:

CREATE TABLE NextId (
    IDType VARCHAR(50) NOT NULL,
    NextValue INTEGER NOT NULL,
    PRIMARY KEY (IDType)
);

INSERT INTO Users(UserId, UserType)
SELECT 
    COALESCE(
        src.UserId, -- Use UserId if provided (i.e. update existing user)
        ROW_NUMBER() OVER(ORDER BY CASE WHEN src.UserId IS NULL THEN 0 ELSE 1 END ASC) + 
        (id.NextValue - 1) -- Use newly generated UserId (i.e. create new user)
    )
    AS UserIdFinal,
    src.UserType
FROM (
    -- Bulk Upsert (get source rows from JSON parameter)
    SELECT src.FirstName, src.UserId, src.UserType
    FROM JSONB_TO_RECORDSET(pUserDataJSON->'users') AS src(FirstName VARCHAR(100), UserId INTEGER, UserType CHAR(1))
) src
CROSS JOIN ( 
    -- Get next ID value to use
    SELECT NextValue
    FROM NextId 
    WHERE IdType = 'User'
    FOR UPDATE -- Use "Update" row-lock so it is not read by any other queries also using "Update" row-lock
) id
ON CONFLICT(UserId) DO UPDATE SET
UserType = EXCLUDED.UserType;

-- Increment UserId value
UPDATE NextId
SET NextValue = NextValue + COALESCE(NewUserCount,0)
WHERE IdType = 'User'
;   

Just change the locking statement to Teradata syntax (LOCK TABLE NextId FOR WRITE) and add an ACTIVITY_COUNT variable after your INSERT/MERGE to capture the # rows affected. 只需将锁定语句更改为Teradata语法(LOCK TABLE NextId FOR WRITE),然后在INSERT / MERGE之后添加ACTIVITY_COUNT变量即可捕获受影响的#行。 This assumes you're doing all this inside a stored procedure. 假设您正在存储过程中进行所有这些操作。

Let me know how it goes... 让我知道事情的后续...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM