简体   繁体   English

如果我在 2 个不同的列中搜索 id,是否需要在 EF Core 中为 id 编制索引?

[英]Do I need to index on a id in EF Core if I'm searching for an id in 2 different columns?

If I do a query like below where I'm searching for the same ID but on two different columns.如果我执行如下查询,我将在两个不同的列上搜索相同的 ID。 Should I have an index like this?我应该有这样的索引吗? Or should I create 2 separate indexes, one for each column?或者我应该创建 2 个单独的索引,每列一个?

modelBuilder.Entity<Transfer>()
  .HasIndex(p => new { p.SenderId, p.ReceiverId });

Query:询问:

var transfersCount = await _dbContext.Transfers
    .Where(p => p.ReceiverId == user.Id || p.SenderId == user.Id)
.CountAsync();

What if I have a query like this below, would I need a multicolumn index on all 4 columns?如果我有一个像下面这样的查询,我是否需要所有 4 列上的多列索引?

var transfersCount = await _dbContext.Transfers
.Where(p => (p.SenderId == user.Id || p.ReceiverId == user.Id) &&
      (!transferParams.Status.HasValue || p.TransferStatus == (TransferStatus)transferParams.Status) &&
      (!transferParams.Type.HasValue || p.TransferType == (TransferType)transferParams.Type))
.CountAsync();

I recommend two single-column indices.我推荐两个单列索引。

The two single-column indices will perform better in this query because both columns would be in a fully ordered index.两个单列索引在此查询中将执行得更好,因为两个列都在一个完全有序的索引中。 By contrast, in a multi-column index, only the first column is fully ordered in the index.相比之下,在多列索引中,只有第一列在索引中是完全排序的。

If you were using an AND condition for the sender and receiver, then you would benefit from a multi-column index.如果您对发送方和接收方使用 AND 条件,那么您将受益于多列索引。 The multi-column index is ideal for situations where multiple columns have conditional statements that must all be evaluated to build the result set (eg, WHERE receiver = 1 AND sender = 2 ).多列索引非常适用于多列具有必须全部评估以构建结果集的条件语句的情况(例如, WHERE receiver = 1 AND sender = 2 )。 In an OR condition, a multi-column index would be leveraged as though it were a single-column index only for the first column;在 OR 条件下,将利用多列索引,就好像它是仅用于第一列的单列索引; the second column would be unindexed.第二列将没有索引。

The full intricacies of index design would take well more than an SO answer to explain;索引设计的全部复杂性将远远超过一个 SO 答案来解释; there are probably books about it, and it will feature as a reasonable proportion of a database administrator's job可能有关于它的书籍,并且它将作为数据库管理员工作的合理比例

Indexes have a cost to maintain so you generally strive to have the fewest possible that offer you the most flexibility with what you want to do.索引需要维护成本,因此您通常会努力尽可能少地为您提供最大的灵活性来完成您想做的事情。 Generally an index will have some columns that define its key and a reference to rows in the table that have those keys.通常,索引将具有一些定义其键的列以及对表中具有这些键的行的引用。 When using an index the database engine can quickly look up the key, and discover which rows it needs to read from.使用索引时,数据库引擎可以快速查找键,并发现需要读取哪些行。 It then looks up those rows as a secondary operation.然后它将这些行作为辅助操作查找。 Indexes can also store table data that aren't part of the lookup key, so you might find yourself creating indexes that also track other columns from the row so that by the time the database has found the key it's looking for in the index it also has access to the row data the query wants and doesn't then need to launch a second lookup operation to find the row.索引还可以存储不属于查找键的表数据,因此您可能会发现自己创建的索引还跟踪行中的其他列,这样当数据库在索引中找到它正在查找的键时,它也会可以访问查询所需的行数据,然后不需要启动第二个查找操作来查找该行。 If a query wants too many rows from a table, the database might decide to skip using the index at all;如果查询需要表中的太多行,数据库可能决定完全不使用索引; there's some threshold beyond which it's faster to just read all the rows direct from the table and search them rather than suffer the indirection of using the index to find which rows need to be read有一些阈值,超过它直接从表中读取所有行并搜索它们而不是使用索引来查找需要读取哪些行的间接性更快

The columns that an index indexes can serve more than one query;索引索引可以服务多个查询的列; order is important.顺序很重要。 If you always query a person by name and also sometimes query by age, but you never query by age alone, it would be better to index (name,age) than (age,name).如果您总是按姓名查询一个人,有时也按年龄查询,但您从不单独按年龄查询,那么索引 (name,age) 比索引 (age,name) 更好。 An index on (name,age) can serve a query for just WHERE name =... , and also WHERR name =... and age =... . (name,age) 上的索引可以为WHERE name =...以及WHERR name =... and age =...提供查询。 If you use an OR keyword in a where clause you can consider that as a separate query entirely that would need its own index.如果在 where 子句中使用 OR 关键字,则可以将其视为完全需要自己索引的单独查询。 Indeed the database might decide to run "name or age" as two parallel queries and combine the results to remove duplicates.实际上,数据库可能决定将“姓名或年龄”作为两个并行查询运行,并结合结果以删除重复项。 If your app needs later change so that instead of just querying a mix of (name), (name and age) it is now frequently querying (name), (name and age), (name or age), (age), (age and height) then it might make sense to have two indexes: (name, age) plus (age, height).如果您的应用程序需要稍后更改,而不是仅仅查询 (name), (name and age) 的组合,它现在经常查询 (name), (name and age), (name or age), (age), (年龄和身高),那么有两个索引可能是有意义的:(姓名,年龄)加上(年龄,身高)。 The database can use part or all of both of these to server the common queries.数据库可以使用其中的部分或全部来为常见查询提供服务。 Remember that using part of an index only works from left to right.请记住,使用索引的一部分只能从左到右工作。 An index on (name, age) wouldn't typically serve a query for age alone. (name, age) 上的索引通常不会单独提供年龄查询。

If you're using SQLServer and SSMS you might find that showing the query plan also reveals a missing index recommendation and it's worth considering carefully whether an index needs to be added.如果您使用 SQLServer 和 SSMS,您可能会发现显示查询计划也会显示缺少索引建议,因此值得仔细考虑是否需要添加索引。 Apps deployed to Microsoft azure also automatically look at common queries where performance suffers because of a lack of an index and it can be the impetus to take a look at the query being run and seeing how existing or new indexes might be extended or rearranged to cover it;部署到 Microsoft azure 的应用程序还会自动查看由于缺少索引而导致性能受损的常见查询,这可能会促使您查看正在运行的查询并了解如何扩展或重新排列现有或新索引以覆盖它; as first noted it's not really something a single SO answer of a few lines can prep you for with a "always do this and it will be fine" - companies operating at large scale hire people whose sole mission is to make sure the database runs well they usually grumble a lot about the devs and more so about things like entity framework because an EF LINQ query is a layer disconnected from the actual SQL being run and may not be the most optimal approach to getting the data.正如第一次指出的那样,几行的单一 SO 答案并不能让你准备好“总是这样做,这会很好” - 大规模运营的公司雇用的唯一任务是确保数据库运行良好的人他们通常对开发人员抱怨很多,尤其是对实体框架之类的事情抱怨很多,因为 EF LINQ 查询是与正在运行的实际 SQL 断开连接的层,可能不是获取数据的最佳方法。 All these things you have to contend with.所有这些事情你都必须面对。

In this particular case it seems like indexes on SenderId+TransferStatus+TransferType and another on ReceiverId+TransferStatus+TransferType could help the two queries shown, but I wouldn't go as far as to say "definitely do that" without taking a holistic view of everything this table contains, how many different values there are in those columns and what it's used for in the context of the app.在这种特殊情况下,似乎 SenderId+TransferStatus+TransferType 上的索引和 ReceiverId+TransferStatus+TransferType 上的另一个索引可以帮助显示的两个查询,但我不会 go 就说“肯定这样做”而不采取整体观点该表包含的所有内容中,这些列中有多少不同的值以及它在应用程序上下文中的用途。 If Sender/Receiver are unique, there may be no point in adding more columns to the index as keys.如果 Sender/Receiver 是唯一的,则将更多列作为键添加到索引中可能没有意义。 If TransferStatus and Type change such that some combination of them helps uniquely identify some particular row out of hundreds then it may make sense, but then if this query only runs once a day compared to another that is used 10 times a second... There's too much variable and unknown to provide a concrete answer to the question as presented;如果 TransferStatus 和 Type 发生变化,使得它们的某种组合有助于唯一地识别数百个特定行中的某个特定行,那么它可能是有意义的,但是如果这个查询每天只运行一次,而另一个每秒使用 10 次......太多变数和未知数,无法为所提出的问题提供具体答案; don't optmize prematurely - indexing columns just because they're used in some where clause somewhere would be premature不要过早地优化 - 仅仅因为它们在某个地方的某些 where 子句中使用而对列进行索引会为时过早

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM