简体   繁体   English

创建用于复杂筛选的sql索引

[英]Create sql indexes for complex filtering

There is table in sql database human . sql数据库中有表human I have ui for this table and filter form like this: 我有此表的用户界面和过滤器形式,如下所示: 在此处输入图片说明

I can set only some values (for instance age and state only). 我只能设置一些值(例如,仅年龄和州)。 If filter item is not specified it won't be add to sql WHERE condition. 如果未指定过滤器项,则不会将其添加到sql WHERE条件中。 WHERE condition is combined in order described on picture. WHERE条件在画面上描述的顺序进行组合。 So if I want to create indexes for all cases to get performance boost I need to create this indexes: 因此,如果我想为所有情况创建索引以提高性能,则需要创建以下索引:

  • first name 名字
  • last name
  • age 年龄
  • state
  • birthday 生日
  • gender 性别
  • first name + last name 名+姓
  • first name + last name + age 名+姓+年龄
  • first name + last name + age + state 名+姓+年龄+州
  • ... ...
  • state + birtday 州+生日
  • state + birthday + gender 州+生日+性别
  • ... ...
  • state + gender 国家+性别

    It looks bad for me. 对我来说不好 Should I select only most used combinations? 我应该只选择最常用的组合吗? What do you think? 你怎么看?

If you have the index first name + last name + age + state , you don't also need first name + last name + age and first name + last name and first name . 如果您有索引first name + last name + age + state ,您也不需要first name + last name + agefirst name + last namefirst name If you have the index first name + last name + age + state and a user searches only on "first name" and "last name", the database will be able to use that index. 如果您具有索引first name + last name + age + state并且用户仅搜索“名字”和“姓氏”,则数据库将能够使用该索引。 As long as the user specifies columns in the same left-to-right order as your index, the database will be able to use the index even if every column isn't specified. 只要用户以与索引相同的顺序从左到右指定列,即使未指定每一列,数据库也将能够使用索引。

For instance, if you have the index first name + last name + age + state and the user specifies "first name" and "last name", then the database will be able to use that index to jump to the matching rows. 例如,如果您具有索引first name + last name + age + state ,并且用户指定了“名字”和“姓氏”,则数据库将能够使用该索引跳转到匹配的行。 However, if the user specifies "first name" and "age", or "first name" and "state", then the database will only partially use the index to jump to the rows with matching first names, but it will then have to do a scan for the rows that match "age" or "state". 但是,如果用户指定“名字”和“年龄”,或“名字”和“州”,则数据库将仅部分使用索引来跳转到具有匹配名字的行,但随后必须扫描匹配“年龄”或“州”的行。 If you want to know the technical details behind why this is true, read about database indexes and B+ trees. 如果您想知道为什么如此,请阅读有关数据库索引和B +树的技术细节。 This is a good explanation. 是一个很好的解释。

Databases can also use multiple indexes when running a single query. 运行单个查询时,数据库也可以使用多个索引。 If you have the indexes 如果您有索引

`last name`
`state`
`age`

And the user searches for "last name", "state", and "age", the database will be able to use all three indexes to quickly find the matching rows for each field, and then the results will be combined and rows that don't match all three indexes will not be selected. 然后用户搜索“姓氏”,“州”和“年龄”,数据库将能够使用所有三个索引快速找到每个字段的匹配行,然后将结果与不包含行的行合并不匹配,将不会选择所有三个索引。 If you look at an execution plan, you'll be able to see it doing this. 如果您查看执行计划,则可以看到它正在执行此操作。 Granted this will be a tiny bit slower than having a single index that has every necessary field in it, but it will prevent you from having a ton of indexes. 当然,这将比其中具有每个必填字段的单个索引慢一点,但是它将阻止您拥有大量索引。

Also note that even if an index exists, the database may not necessarily use that index because doing a row scan maybe faster. 还要注意,即使存在索引,数据库也不一定会使用该索引,因为进行行扫描可能会更快。 For instance, take the above example with three different indexes, and suppose the user does a search on "last name", "first name", and "state". 例如,以上面的示例为例,它具有三个不同的索引,并假设用户对“姓氏”,“名字”和“州”进行搜索。 Because the combination of "last name" and "first name" has a such a high selectivity (meaning most of the values in that index are unique), it might be faster to just use the index to get all the rows that match the first name and last name and then just do a simple iterative scan on those rows to find the ones that also have the matching state, than to use the state index as well, and then join the rows that were returned by both indexes. 因为“姓氏”和“名字”的组合具有很高的选择性(意味着该索引中的大多数值都是唯一的),所以仅使用索引来获取与第一个匹配的所有行可能会更快名称和姓氏,然后对这些行进行简单的迭代扫描,以找到也具有匹配状态的行,而不是同时使用state索引,然后将两个索引返回的行进行连接。

When you're designing your indexes, an index won't give you much of a performance boost (and may actually be worse than doing a full table scan) if the selectivity of your index is really low. 在设计索引时,如果索引的选择性确实很低,那么索引将不会给您带来很多性能提升(实际上可能比进行全表扫描还差)。 Gender, for instance, is not a good field to have indexed because you only have two possible values. 例如,性别不是索引的好字段,因为您只有两个可能的值。 If the user is searching only on gender, you will never get good performance with or without indexes because you will return half your rows. 如果用户仅按性别进行搜索,则无论有没有索引,您都永远不会获得良好的效果,因为您将返回一半的行。

Row-for-row, a full table scan is actually faster than using an index. 逐行全表扫描实际上比使用索引快。 The reason for this is that when the database does a table scan, it is able to jump straight to the data page on disk. 原因是当数据库进行表扫描时,它可以直接跳到磁盘上的数据页。 When it uses an index, it has to go through a few intermediate index pages before it actually gets to where the data is stored on disk. 当使用索引时,它必须经过一些中间索引页,然后才能真正到达数据在磁盘上的存储位置。 For a field like "gender" where you're going to be selecting half of your rows, the added overhead of following your index links for half the rows in the table may outweigh the cost of just scanning the entire table without using indexes. 对于像“性别”这样的字段,您将选择一半的行,在表的一半行中跟随索引链接而增加的开销可能会超过不使用索引而扫描整个表的开销。

I would recommend indexes 我会推荐索引

`first name, last name`
`birthdate`
`state`

If you have a specific combination of fields that is searched on frequently, then you can make an index for that too to speed things up. 如果您有经常搜索的特定字段组合,那么您也可以为其建立索引以加快处理速度。 However, don't make an index for every combination of fields. 但是,不要为每个字段组合都建立索引。

If you use "birthdate" instead of "birthday", then you don't need "age" because you can calculate that based on "birthdate" and then do a between query on "birthdate". 如果使用“生日”而不是“生日”,则不需要“年龄”,因为您可以基于“生日”进行计算,然后对“生日”进行一次between查询。 If you're forced to have separate columns for "birthday" and "age", then you could index "age" as well. 如果您被迫将“生日”和“年龄”的列分开,那么您也可以为“年龄”编制索引。 However, like another user commented below, you'd have to constantly update your ages. 但是,就像下面其他用户评论的那样,您必须不断更新您的年龄。 I strongly recommend against that design. 我强烈建议您反对这种设计。

One final thing to consider is whether to try to make a covering index . 最后要考虑的一件事是是否尝试建立覆盖指数 A covering index is one in which every field that the user searched for is part of your index. 覆盖索引是其中用户搜索的每个字段都属于索引的索引。 For example, suppose your table has 100 fields in it, but users are usually only interested in looking up someone's state and age based on their name. 例如,假设您的表中有100个字段,但是用户通常只对根据其姓名查找某人的状态和年龄感兴趣。 So a large percentage of your queries look something like this 因此,大部分查询看起来像这样

SELECT STATE, AGE FROM PEOPLE WHERE FIRSTNAME = 'Homer' AND LASTNAME = 'Simpson'

If your index is LASTNAME, FIRSTNAME , then the database will look up "Homer" and "Simpson" in your index (which will involve reading a few index pages from disk), use the index pointer to go to the disk page where the data record is stored, read that entire data page, parse it into fields, and then return the state and age. 如果您的索引是LASTNAME, FIRSTNAME ,那么数据库将在索引中查找“ Homer”和“ Simpson”(这将涉及从磁盘读取一些索引页),使用索引指针转到数据所在的磁盘页记录被存储,读取整个数据页,将其解析为字段,然后返回状态和年龄。

Now, suppose you run the same query but your index is LASTNAME, FIRSTNAME, STATE, AGE . 现在,假设您运行相同的查询,但是索引是LASTNAME, FIRSTNAME, STATE, AGE The database engine will still use your index to look up "Homer" and "Simpson", but once it finds the appropriate index record (exactly the same as how it worked above), that index record already has STATE and AGE . 数据库引擎仍将使用索引来查找“ Homer”和“ Simpson”,但是一旦找到合适的索引记录(与上面的工作方式完全相同),该索引记录就已经具有STATEAGE Therefore, the database can get the results of your query straight from the index without having to also read the data page from disk. 因此,数据库可以直接从索引获取查询结果,而不必同时从磁盘读取数据页。

A situation where a covering index can drastically improve performance is in the case of table scans. 在表扫描的情况下,覆盖索引可以大大提高性能。 Assume you have 100 fields in your table (so the size of a single row is a few hundred bytes or more). 假设您的表中有100个字段(因此,单行的大小为几百个字节或更多)。 Now a user runs the query 现在,用户运行查询

SELECT FIRSTNAME, LASTNAME, AGE FROM PEOPLE

The database would have to read the entire table (including all 100 fields which aren't necessary for this query) to get your results. 数据库必须读取整个表(包括该查询不需要的所有100个字段)才能获得结果。 If you had an index LASTNAME, FIRSTNAME, AGE , then the database could get the results by scanning your entire index instead of scanning the entire table. 如果您有索引LASTNAME, FIRSTNAME, AGE ,则数据库可以通过扫描整个索引而不是扫描整个表来获取结果。 Since in this case a single index element is far smaller byte-wise than a single data row, the query will be much faster. 由于在这种情况下,单个索引元素的字节长度比单个数据行的字节长度小得多,因此查询会快得多。

In your particular case with so few fields in your table, a covering index probably wouldn't be very useful since the fields in the index would be the same as the fields in your table, thus defeating the whole purpose. 在您的特殊情况下,表中的字段太少,覆盖索引可能不是很有用,因为索引中的字段与表中的字段相同,从而破坏了整个目的。 However, for a table with dozens of fields, of which only a handful are commonly queried, a covering index can be a great way to speed up your queries. 但是,对于具有数十个字段的表(通常只查询少数几个字段),覆盖索引可以是加快查询速度的一种好方法。

Lots of indexes is a 'bad' idea. 很多索引是一个“坏”主意。
Indexes on individual columns won't help much. 各个列上的索引没有太大帮助。
One index that is a 'prefix' of another is redundant. 一个索引是另一索引的“前缀”是多余的。
An index on a flag or column of low 'cardinality' (eg gender ), won't be used. 低基数(例如gender )的标志或列上的索引将不被使用。

Suggestion: Start with one index per column. 建议:从每列一个索引开始。 Then add on a second column to each index. 然后将第二列添加到每个索引。 Pick this second column based on what is likely to be tested together. 根据可能要一起测试的内容选择第二列。 Avoid having both (a,b) and (b,a) 避免同时拥有(a,b)(b,a)

Then watch what types of queries are generated by 'real' users. 然后观察“真实”用户生成的查询类型。 Tweak the list of indexes accordingly. 相应地调整索引列表。 This info may lead to a few 3-column indexes. 此信息可能会导致一些3列索引。

I would go with this approach.. 我会采用这种方法。

Having a key column on index is great for filtering out rows and doing a seek exactly.But with your form,you need many keys as key columns,but having many key columns is not good and it has a limit too.. 在索引上具有键列非常适合于筛选出行并进行精确查找。但是对于您的表单,您需要许多键作为键列,但是具有很多键列是不好的,并且也有一个限制。

So i suggest you to identify few columns which are unique or composite index with fields that wont be null,if you dont have unique columns and create a clustered index.. 因此,如果您没有唯一的列并创建聚簇索引,那么我建议您识别一些唯一或组合索引的列,这些列的字段不会为空。

I would create clustered index on birthday,age(just an idea ,you may use other columns as well) and then create a stored procedure with default parameters like below.. 我会在生日,年龄(只是一个主意,也可以使用其他列)上创建聚簇索引,然后使用以下默认参数创建一个存储过程。

create proc usp_getformdata
(
@firstname varchar(200)= null,
@lastname varchar(200)=null,
@age int=null,
@state varchar(20)=null,
@birthday datetime =null,
@gender varchar(10)=null
)
As
Begin
select 
* from
yourtable
where 
firstname=@firstname
and 
lastname=@lastname

--do for all columns
End

One index can work for multiple where clauses. 一个索引可以用于多个where子句。 So: 所以:

(firstname, lastname, age, state)

works for where clauses that have equality conditions for: 适用于以下条件具有相等条件的where子句:

firstname
firstname & lastname
firstname & lastname & age
firstname & lastname & age & state

I would suggest that you build a set of indexes for the common cases -- three or four indexes. 我建议您为常见情况建立一组索引-三个或四个索引。 Add multiple keys to the index, so it can be used for more and more refined searches. 将多个键添加到索引,以便可以用于越来越多的精确搜索。 Don't bother putting low-cardinality values, such as gender as the first key in the index, because a query using just a filter on gender is probably going to require a full table scan anyway. 不要费心把低基数的值(例如gender作为索引的第一个关键字,因为仅使用性别过滤器的查询可能无论如何都要进行全表扫描。

If this doesn't meet your needs, you might need to think about other methods for accessing the data, such as full-text indexes. 如果这不能满足您的需求,则可能需要考虑其他访问数据的方法,例如全文本索引。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM