简体   繁体   English

将此数据 model 存储为非关系数据?

[英]Store this data model as non-relational?

I am currently in a situation where I think of redesigning one of my classic relational database tables into a non-relational design and I am not sure if I should do it or not.我目前处于一种情况,我想将我的一个经典关系数据库表重新设计为非关系设计,但我不确定是否应该这样做。 Reason is that performance problems simply got out of control.原因是性能问题完全失控了。

This table has 11 columns and it is designed like this:该表有 11 列,其设计如下:

Id bigint (PK)  -- Clustered Index
FK1 bigint (FK) -- Non-Clustered Unique Composite Index ( FK1_FK2_FK3 in this order )
FK2 bigint (FK) -- Non-Clustered Unique Composite Index ( FK1_FK2_FK3 in this order )
FK3 bigint (FK) -- Non-Clustered Unique Composite Index ( FK1_FK2_FK3 in this order )
Value1 nvarchar(100)
Value2 nvarchar(100)
Value3 nvarchar(100)
Value4 nvarchar(100)
Value5 nvarchar(100)
Value6 nvarchar(100)
Value7 nvarchar(100)
Value8 nvarchar(100)

Here are some facts:以下是一些事实:

  • database table has ~800.000.000 records and increasing faster than expected数据库表有约 800.000.000 条记录,并且增长速度比预期的要快
  • very low to low number of requests but high workload per request请求数量很少到很少,但每个请求的工作量很高

5% of the requests look like this: 5% 的请求如下所示:

SELECT * 
FROM myTable 
WHERE Id = 12346 -- (works perfectly)

SELECT * 
FROM myTable 
WHERE Id IN (123456, 654321) -- (works OK because IN list contains only a small number of IDs)

UPDATE myTable 
SET .....
WHERE Id = 123456 (works perfectly)

Unfortunately 95% of the requests look like this:不幸的是,95% 的请求看起来像这样:

SELECT *
FROM myTable 
WHERE Fk1 = 123456 AND FK2 = 654321 
-- works badly because it gets 100.000 - 300.000 records but I need all of them. Yes, unique index is used because order of index is correct )

UPDATE myTable 
SET Value1 = '1', Value2 = '2', Value3 = '3', Value 4 = '4', 
    Value5 = '5', Value6 = '6', Value7 = '7', Value8 = '8' 
WHERE Fk1 = 123456 AND FK2 = 654321   -- horrible because also 300.000 and yes, unique index is used because order of index is correct )

Instead I would like to design it like that:相反,我想这样设计它:

Id1 bigint (PK) -- Clustered Composite Index (former FK1 column)
Id2 bigint (PK) -- Clustered Composite Index (former FK2 column)
ContentColumn JSON -- Contains all former Value columns and the FK3 column as an array of objects. A object is column FK3, Value1, Value2 ....
ArrayLength INT -- length of json array
  • Former FK1 and Fk2 are now a clustered unique composite PK because 95% are selecting and updating on that以前的 FK1 和 Fk2 现在是一个集群的唯一复合 PK,因为 95% 正在选择和更新它
  • This means of course when I run in my 5% situation I have to load more than I actually need but in my opinion it is still worth这当然意味着当我在 5% 的情况下运行时,我必须加载比我实际需要的更多的负载,但我认为这仍然值得
  • I have to know of course in which Json FK3 is before request but I do that because of some other logic我当然必须知道 Json FK3 在请求之前,但我这样做是因为其他一些逻辑
  • I also do not necessary need the composite unique constraint together with FK3.我也不需要将复合唯一约束与 FK3 一起使用。
  • Foreign key constraint of FK3 is also not necessary. FK3 的外键约束也不是必须的。
  • There are no form of SQLs that specifically read or search in a specific former Value column eg WHERE Value5 = 'ABC'没有任何形式的 SQL 专门读取或搜索特定的前值列,例如 WHERE Value5 = 'ABC'

So what do you guys think?那你们怎么看? Should I give it a try?我应该试一试吗?

Or do you have some completely different ideas?或者你有一些完全不同的想法?

Thanks for any help!谢谢你的帮助!

Or do you have some completely different ideas?或者你有一些完全不同的想法?

You can do better with a better set of index designs.您可以使用一组更好的索引设计做得更好。 You want to optimize for your most expensive queries:您想针对最昂贵的查询进行优化:

SELECT *
FROM myTable 
WHERE Fk1 = 123456 AND FK2 = 654321 
-- works badly because it gets 100.000 - 300.000 records but I need all of them. Yes, unique index is used because order of index is correct )

UPDATE myTable 
SET Value1 = '1', Value2 = '2', Value3 = '3', Value 4 = '4', 
    Value5 = '5', Value6 = '6', Value7 = '7', Value8 = '8' 
WHERE Fk1 = 123456 AND FK2 = 654321 

Making FK1_FK2_FK3 the clustered index and making ID a non-clustered PK would be better.将 FK1_FK2_FK3 设为聚集索引并将 ID 设为非聚集 PK 会更好。 For queries that retrieve a handful of rows, using nested loop join from the non-clustered PK to the composite clustered index should be fine.对于检索少量行的查询,使用从非聚集 PK 到复合聚集索引的嵌套循环连接应该没问题。 But doing 300,000 lookups when querying by (Fk1,Fk2) is going to be expensive.但是通过 (Fk1,Fk2) 查询时进行 300,000 次查找会很昂贵。 It's so expensive that these queries might be doing table scans instead.它是如此昂贵,以至于这些查询可能会进行表扫描。

And after clustering the table by (FK1,FK2,FK3) consider partitioning it by FK2 into 10-100 separate partitions.在通过 (FK1,FK2,FK3) 对表进行聚类后,考虑通过 FK2 将其划分为 10-100 个单独的分区。 Then a predicate like WHERE Fk1 = 123456 AND FK2 = 654321 will only have to scan the partition containing FK2=654321, and can seek in that partition directly to the first page with FK1=123456.然后像WHERE Fk1 = 123456 AND FK2 = 654321这样的谓词只需要扫描包含 FK2=654321 的分区,并且可以在该分区中直接查找 FK1=123456 的第一页。

In addition consider ROW or PAGE compression if PAGEIOLATCH waits are a significant part of your query runtime.此外,如果 PAGEIOLATCH 等待是查询运行时的重要部分,请考虑 ROW 或 PAGE 压缩。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM