简体   繁体   English

如何查询巨大的MySQL数据库?

[英]How to query huge MySQL databases?

I have 2 tables, a purchases table and a users table. 我有2个表,一个purchases表和一个users表。 Records in the purchases table looks like this: 购买表中的记录如下所示:

purchase_id | product_ids | customer_id
---------------------------------------
1           | (99)(34)(2) | 3 
2           | (45)(3)(74) | 75

Users table looks like this: 用户表如下所示:

user_id  | email              | password
----------------------------------------
3        | joeShmoe@gmail.com | password 
75       | nolaHue@aol.com    | password

To get the purchase history of a user I use a query like this: 为了获得用户的购买历史记录,我使用如下查询:

mysql_query(" SELECT * FROM purchases WHERE customer_id = '$users_id' ");

The problem is, what will happen when tens of thousands of records are inserted into the purchases table. 问题是,当成千上万条记录插入购买表时会发生什么。 I feel like this will take a performance toll. 我觉得这会影响演出。

So I was thinking about storing the purchases in an additional field directly in the user's row: 因此,我正在考虑将购买的商品直接存储在用户行的其他字段中:

user_id | email              | password  | purchases
------------------------------------------------------
1       | joeShmoe@gmail.com | password  | (99)(34)(2)
2       | nolaHue@aol.com    | password  | (45)(3)(74)

And when I query the user's table for things like username, etc. I can just as easily grab their purchase history using that one query. 当我在用户的表中查询用户名等信息时,使用该查询就可以轻松获取他们的购买历史记录。

Is this a good idea, will it help better performance or will the benefit be insignificant and not worth making the database look messier? 这是一个好主意吗?它会帮助改善性能,还是收益微不足道,不值得使数据库看起来更混乱?

I really want to know what the pros do in these situations, for example how does amazon query it's database for user's purchase history since they have millions of customers. 我真的很想知道专家在这些情况下的工作方式,例如,由于亚马逊拥有数百万的客户,因此亚马逊如何查询用户购买历史的数据库。 How come there queries don't take hours? 查询为何不花几个小时?

EDIT 编辑

Ok, so I guess keeping them separate is the way to go. 好的,所以我想将它们分开是可行的方法。 Now the question is a design one: 现在的问题是一个设计:

Should I keep using the "purchases" table I illustrated earlier. 我应该继续使用前面说明的“购买”表吗? In that design I am separating the product ids of each purchase using parenthesis and using this as the delimiter to tell the ids apart when extracting them via PHP. 在该设计中,我使用括号将每次购买的产品ID分开,并将其用作分隔符,以在通过PHP提取ID时区分ID。

Instead should I be storing each product id separately in the "purchases" table so it looks like this?: 相反,我应该将每个产品ID分别存储在“购买”表中,这样看起来是这样吗?:

purchase_id | product_ids | customer_id
---------------------------------------
1           | 99          | 3 
1           | 34          | 3
1           | 2           | 3
2           | 45          | 75
2           | 3           | 75
2           | 74          | 75

Nope, this is a very, very, very bad idea. 不,这是一个非常非常非常糟糕的主意。

You're breaking first normal form because you don't know how to page through a large data set. 您正在打破第一个范式,因为您不知道如何分页浏览大型数据集。

Amazon and Yahoo! 亚马逊和雅虎! and Google bring back (potentially) millions of records - but they only display them to you in chunks of 10 or 25 or 50 at a time. Google会带回(潜在地)数百万条记录-但它们一次只能以10或25或50的块显示给您。

They're also smart about guessing or calculating which ones are most likely to be of interest to you - they show you those first. 他们也很聪明地猜测或计算出您最可能感兴趣的那些-他们首先向您展示。

Which purchases in my history am I most likely to be interested in? 我最可能对我的历史记录中的哪些购买感兴趣? The most recent ones, of course. 当然是最新的。

You should consider building these into your design before you violate relational database fundamentals. 在违反关系数据库基础知识之前,您应该考虑将它们构建到设计中。

By doing this with your schema, you will break the entity-relationship of your database. 通过对架构执行此操作,将破坏数据库的实体关系。

You might want to look into Memcached , NoSQL , and Redis . 您可能需要研究MemcachedNoSQLRedis These are all tools that will help you improve your query performances, mostly by storing data in the RAM. 这些都是可以帮助您提高查询性能的工具,主要是通过将数据存储在RAM中。

For example - run the query once, store it in the Memcache, if the user refresh the page, you get the data from Memcache, not from MySQL, which avoids querying your database a second time. 例如-运行一次查询,将其存储在Memcache中,如果用户刷新页面,您将从Memcache中获取数据,而不是从MySQL中获取数据,从而避免了第二次查询数据库。

Hope this helps. 希望这可以帮助。

First off, tens of thousands of records is nothing. 首先,成千上万的记录是什么。 Unless you're running on a teensy weensy machine with limited ram and harddrive space, a database won't even blink at 100,000 records. 除非您在内存和硬盘空间有限的青少年计算机上运行,​​否则数据库甚至不会闪烁100,000条记录。

As for storing purchase details in the users table... what happens if a user makes more than one purchase? 至于将购买详细信息存储在用户表中……如果用户进行了多次购买,会发生什么?

MySQL is hugely extensible, and don't let the fact that it's free convince you of otherwise. MySQL具有极大的可扩展性,不要让它免费的事实使您信服。 Keeping the two tables separate is probably best, not only because it keeps the db more normal, but having more indices will speed queries. 将两个表分开可能是最好的,这不仅是因为它使数据库更正常,而且具有更多的索引将加快查询速度。 A 10,000 record database is relatively small in deference to multi-hundred-million record health record databases. 10,000个记录数据库相对于亿万个记录健康记录数据库而言相对较小。

As far as Amazon and Google, they hire hundreds of developers to write specialized query languages for their specific application needs... not something developers like us have the resources to fund. 就亚马逊和谷歌而言,他们雇用了数百名开发人员来为他们的特定应用程序需求编写专门的查询语言……没有像我们这样的开发人员有足够的资金来资助。

Your database already looks messy, since you are storing multiple product_ids in a single field, instead of creating an "association" table like this. 由于您将多个product_ids存储在单个字段中,而不是像这样创建“关联”表,因此数据库看起来已经很混乱。

_____product_purchases____
purchase_id | product_id |
--------------------------
          1 |         99 |
          1 |         34 |
          1 |          2 |

You can still fetch it in one query: 您仍然可以在一个查询中获取它:

SELECT * FROM purchases p LEFT JOIN product_purchases pp USING (purchase_id)
   WHERE purchases.customer_id = $user_id

But this also gives you more possibilities, like finding out how many product #99 were bought, getting a list of all customers that purchased product #34 etc. 但这也为您提供了更多的可能性,例如找出购买了#99产品的数量,获得购买了#34产品的所有客户的清单等。

And of course don't forget about indexes, that will make all of this much faster. 当然,不要忘记索引,这将使所有这些过程变得更快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM