简体   繁体   English

长字符串上的mysql where子句使查询太慢

[英]mysql where clause on long strings makes the query too slow

I have created a mysql table which has the crime count, Crime description, Crime Category and address of crime. 我创建了一个mysql表,其中包含犯罪计数,犯罪描述,犯罪类别和犯罪地址。 I have created some reports over this table. 我已经在此表上创建了一些报告。 The user wants to have a search by address filter in the report. 用户希望在报告中使用按地址搜索过滤器。 so we are going to be using a where clause on table and have a condition over street. 因此我们将在表上使用where子句,并在street上设置条件。

The problem is that street address is quite a large string and searching/filtering the table over address when the table is already quite big will take a lot of time. 问题在于街道地址是一个很大的字符串,当表已经很大时,通过地址搜索/过滤表会花费很多时间。 I tried using some hashing like md5(streetaddress) but that did not help either. 我尝试使用像md5(streetaddress)这样的散列,但这都没有帮助。 The query become very slow with this kind of where clause 这种where子句的查询变得非常缓慢

example

select * from crimedata where streetaddress = "41 BENNETT RD Watertown  Massachusetts United States"

Will indexing the streetaddress help in this case or should I use some kind of hashing to make this kind of string search faster in the table? 在这种情况下,为街道地址编制索引会有所帮助吗?还是应该使用某种哈希方法来使表中的这种字符串搜索更快?

Shah 沙阿

Adding an index on streetaddress will help a bit but limited. 在街道地址上添加索引会有所帮助,但有一定局限性。

You may want to consider changing your storage engine to something that supports fulltext search. 您可能需要考虑将存储引擎更改为支持全文搜索的内容。 An example is Mroonga 一个例子是Mroonga

NOTE: I am not associated with Mroonga. 注意:我与Mroonga无关。 I just had a chance to use the library before and found that it does provide improvement in text search. 我刚刚有机会使用该库,发现它确实可以改善文本搜索。

You could try properly normalizing your data, where addresses are stored in one table and referenced by ID in another. 您可以尝试正确地规范化数据,其中地址存储在一个表中,而ID由另一个表引用。

Your query should look like?: 您的查询应如下所示:

SELECT ... FROM crimedata WHERE address_id=?

Where that ? 那在哪里? is a placeholder for the ID of the address you fetch from the other table. 是占位符,代表您从另一张表中获取的地址ID。

As always, anything that shows up repeatedly in a WHERE clause as a condition is a strong candidate for being indexed. 与往常一样,在WHERE子句中作为条件重复出现的任何内容都是被索引的强大候选对象。

I would take a step back and see if you are attacking the problem in a way that is going to scale. 我将退后一步,看看您是否正在以一种将要扩展的方式来解决问题。

I would look at using geospatial information to do your queries on then use the street address as an output display parameter. 我将研究使用地理空间信息进行查询,然后将街道地址用作输出显示参数。

If you use the GIS object to store things like a point then you'll be able to do radius searches and bounding box queries in the future. 如果您使用GIS对象存储诸如点之类的东西,那么将来您将可以进行半径搜索和边界框查询。

Your coding would change when someone enters in a street address to convert to either lat/long or point. 当有人输入街道地址以转换为经/纬度或点时,您的编码将发生变化。 Then when doing searches it will go much quicker since you won't be doing full text searches. 然后,在进行搜索时,它将更快得多,因为您无需进行全文搜索。 It will give you the ability to call mapping API to show the address or place location on public mapping services. 它将使您能够调用地图API,以在公共地图服务上显示地址或位置。

http://mysqlserverteam.com/mysql-5-7-and-gis-an-example/ http://mysqlserverteam.com/mysql-5-7-and-gis-an-example/

[Yes, of course scaling something like this out to a global scale would take out of the realm of databases into bigdata world] [是的,当然,将这样的事情扩展到全球范围将把数据库领域带入大数据世界]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM