简体   繁体   English

加速本地MySql在大表上启动规范化查询

[英]Speed up local MySql to launch normalization query on big tables

I'm normalizing and "cleaning" a MySql database wherein the biggest table counts ~ 3 mln records. 我正在规范化并“清理”一个MySql数据库,其中最大的表记录了约300万条记录。

What I have to do is to rename some fields (very fast), change their order (quite fast), and doing some trim, string sanitization, extraction of some to other tables and keep the foreign key id... 我要做的是重命名某些字段(非常快),更改其顺序(非常快),进行一些修剪,字符串清理,将某些字段提取到其他表并保留外键id ...

Is there a way so I can speed up the query on my local machine? 有没有办法可以加快本地计算机上的查询速度?

I've MariaDB 10.1.21 (from XAMPP), and running on a MacBook Air 8GB Ram. 我有MariaDB 10.1.21(来自XAMPP),并在MacBook Air 8GB Ram上运行。

I've already put indexes on many fields but it's still slow as a turtle. 我已经在许多字段上添加了索引,但是它仍然像乌龟一样慢。

Any tip will be appreciated. 任何提示将不胜感激。 Thanks! 谢谢!

Edit: as requested more info and some optimization I am performing. 编辑:根据要求提供更多信息和我正在执行的一些优化。

I've basically a big table that contains columns not normalized that would normally been distributed in three tables. 我基本上有一个大表,其中包含未规范化的列,这些列通常会分布在三个表中。

What I have: 我有的:

companies ( id, name, street, city_name, category_name, subcategory_name )

what I want 我想要的是

companies ( id, name, street, id_city, id_subcategory, ... )
cities( id, name, ... )
categories( id, name )
subcategories( id, name, id_category )

So i clean and exctract the datas as follow. 因此,我按照以下说明清理和提取数据。

Trim and clean carriage returns from "dirty" fields: 修剪和干净的回车从“脏”字段返回:

update companies set mic_cat = TRIM(REPLACE(REPLACE(mic_cat, '\r', ''), '\n', ''));

Delete companies that hasn't a correct category. 删除类别不正确的公司。

delete from companies where mic_cat is null or mic_cat = '' or mac_cat is null or mac_cat = '';

Extract the data from the fields and place in new tables: 从字段中提取数据并放入新表中:

insert into categories (name) select distinct mac_cat from companies;
insert into subcategories (name, id_category) select distinct mic_cat,categories.id from companies JOIN categories ON mac_cat = categories.name;

Add the id_reference: 添加id_reference:

ALTER TABLE companies ADD COLUMN id_subcategory int;

Get the keys... 获取钥匙...

UPDATE companies left join subcategories on companies.mic_cat = subcategories.name set id_subcategory = subcategories.id;

The last one was very slow, so, I dropped all the indexes and then create just two index on companies.mic_cat and subcategories.name and it has been fastened quite a bit. 最后一个非常慢,因此,我删除了所有索引,然后在company.mic_cat和subcategories.name上仅创建了两个索引,并且已对其进行了很多固定。

  • Do all updates in a single UPDATE statement. 在单个UPDATE语句中执行所有更新。
  • If you need to modify columns that are in index(es), DROP those indexes first and ADD back later. 如果需要修改索引中的列,请先DROP这些索引,然后再ADD (This may help.) (这可能会有所帮助。)
  • Do all ALTERs in a single ALTER statement. 是否所有的ALTERs在一个单一的ALTER语句。 (This is not always the best advice.) (这并不总是最好的建议。)
  • Think about doing the updates in chunks of rows. 考虑按行进行更新。

Some issues that the above tries to address: 上面试图解决的一些问题:

  • UPDATE without a WHERE clause (and sometimes with a WHERE ) will scan through the entire table, being rather costly. 没有WHERE子句(有时带有WHERE )的UPDATE将扫描整个表,这是相当昂贵的。
  • When an indexed column is modified, the row in the index needs to be removed from one place in the index and added in another. 修改索引列后,需要将索引中的行从索引中的一个位置删除,然后在另一位置添加。 Think of it as a DELETE plus an INSERT -- rather costly. 将其视为DELETE加上INSERT相当昂贵。
  • ALTER may or may not be able to do the work "in place". ALTER可能或可能无法“就地”完成工作。 If multiple of your alters cannot be done that way, then it is best to do a single copy (ie, a single ALTER ) to do all the changes simultaneously. 如果您不能通过这种方式完成多个更改,那么最好是做一个副本(即一个ALTER )来同时进行所有更改。 It effectively creates a new empty table, alters it, copies all the data into it, recreates all the indexes, then renames it back into place. 它可以有效地创建一个新的空表,对其进行更改,将所有数据复制到其中,然后重新创建所有索引,然后将其重命名。

More on indexes... 有关索引的更多信息...

  • Don't index flags; 不要索引标志; such indexes will be shunned. 这样的索引将被避免。
  • Look at your WHERE clauses to see what indexes would be useful. 查看您的WHERE子句,看看哪些索引会有用。
  • Learn about 'composite' indexes: INDEX(a,b) may be much better than INDEX(a), INDEX(b) for some queries. 了解有关“复合”索引的信息:对于某些查询INDEX(a), INDEX(b) INDEX(a,b) 可能INDEX(a), INDEX(b)好得多。
  • Don't blindly index every column -- a big waste. 不要盲目索引每一列,这是一个巨大的浪费。

3M rows is possibly a lot. 3M行可能很多。 In many situations, it is better to UPDATE (or DELETE ) in "chunks". 在许多情况下,最好在“块”中进行UPDATE (或DELETE )。 See my blog . 我的博客

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM