简体   繁体   English

用于多维分类的mySQL表组织

[英]mySQL table organisation for multi-dimensional categorisation

Existing System: 现有系统:

I have a mySQL database that stores category related information for approximately 200 different unique users. 我有一个mySQL数据库,该数据库存储大约200个不同的唯一用户的类别相关信息。 The information being stored and retrieved for each user is in the hierarchy of 为每个用户存储和检索的信息位于以下层次结构中

imageCategories
    > Parent Category 1
        > Child Category 1 : "45,19,3,4,8"
        > Child Category 2 : "17,1,99"
        > ... etc
    > Parent Category 2
        > Child Category 1 : "83,6"
        > Child Category 2 : "19,74,26"
        ... etc
    > etc

The string value of each child category is a series of comma-separated ids which reference descriptions (on a separate table) stored under that child category. 每个子类别的字符串值是一系列用逗号分隔的ID,这些ID引用存储在该子类别下的描述(在单独的表上)。 I store all of this as an array in a column for each user by means of a json_encoded string in the form of: 我通过使用json_encoded字符串将所有这些作为数组存储在每个用户的列中,格式为:

{"Parent Category 1":{"Child Category 1":["45,19,3,4,8"],"Child Category 2":["17,1,99"]},"Parent Category 2":{"Child Category 1":["83,6"],"Child Category 2":["19,74,26"]}}

The system works by retrieving this json_string when a user logs and decoding it to a session array. 系统通过在用户登录时检索此json_string并将其解码为会话数组来工作。 Whenever any changes are made to it, it's reencoded to a json string, saved to the database and the session array is updated to reflect this. 每当对其进行任何更改时,都会将其重新编码为json字符串,然后保存到数据库中,并更新会话数组以反映这一点。 This works fine. 这很好。 While my research way back when made me do so, I was never quite sure if storing a multi-dimensional array in mySQL is good best practise. 虽然我的研究使我回想起了这一点,但我始终不确定在mySQL中存储多维数组是否是最佳的最佳实践。 What I do know is that this keeps organising it quite stress-free and I haven't noticed it causing a lot of overhead, which is not to say that it doesn't. 我所知道的是,这使它的组织工作变得非常轻松,而且我还没有注意到它会造成很多开销,这并不是说不会。


The conundrum: 难题:

What I want to do now is add a string description to each Child Category in the database. 我现在想做的是向数据库中的每个子类别添加一个字符串描述。 Potentially to each Parent Category later but baby steps first. 以后可能会进入每个父类别,但婴儿会先走。

I was initially going to start a third dimension for the overall array. 我最初打算为整个阵列开始第三维。 Instead of: 代替:

"Child Category Key" : "id string"

I would change it to: 我将其更改为:

"Child Category Key" : ["id string", "description string"]

or: 要么:

"Child Category Key" : ["id string", id for description on another table]

I don't see an issue with either, but I'm wondering if im veering way off best practises. 我都不认为这有问题,但是我想知道是否偏离最佳实践。 Should I be creating a new table for the entire category structure, rather than storing all of it as a json string in a column with other user settings (it's never going to get too unwieldly in terms of character length). 我应该为整个类别结构创建一个新表,而不是将其全部作为json字符串存储在具有其他用户设置的列中(就字符长度而言,它永远不会变得太笨拙)。 The current structure is quite easy to get my head around and I wouldn't necessarily jump to a solution that would provide minimal overhead benefits if it's structure makes managing the database unecessarily complicated (keep in mind some of us aren't naturals at this and our brains process this kinda structure a little slower than others). 当前的结构很容易引起我的注意,如果它的结构使数据库的管理不必要地变得复杂,那么我不一定会跳到提供最小开销好处的解决方案(请注意,我们中的某些人对此并不自然,我们的大脑比其他人处理这种结构要慢一些。


Design Requirements: 设计要求:

I may miss out on describing specifics needed as I'm unsure what the most pertinent information is from what's relevant. 我不确定所需要的具体细节,因为我不确定最相关的信息来自什么。 I can elaborate where needed. 我可以在需要的地方详细说明。 What seems the most important design requirement is that each user has unique category keys and values. 似乎最重要的设计要求是每个用户都有唯一的类别键和值。 They can only be in the form of parent > child > csv of ids but each user will have custom key titles and a different number of each. 它们只能采用csv of idsparent > child > csv of ids的形式,但是每个用户将具有自定义键标题,并且每个键标题的编号不同。 The order of each is also essential. 每个的顺序也很重要。

I'm currently running on a server with ssd disk, 1gb of memory and a single 2ghz core from an Intel hexcore. 我目前正在具有ssd磁盘,1gb内存和Intel hexcore单个2ghz内核的服务器上运行。 Requests to the database are primarily retrieving the categories on both a front and backend. 对数据库的请求主要是检索前端和后端的类别。 The majority use little traffic so nothing has been too taxing apart from occasional spikes. 大多数用户使用的流量很少,因此除了偶尔的流量激增外,没有什么负担太大。 I will upgrade when I see a bottleneck approaching. 遇到瓶颈时,我将升级。 Just trying to use what I have as efficiently as possible at the moment and keep best practices in play. 只是尝试尽可能高效地利用我目前拥有的东西,并保持最佳实践。


Database Structure: 数据库结构:

Right now my table structure is in the form of (omitting other columns not relevant to the question): 现在,我的表结构为(省略与该问题无关的其他列)的形式:

Table usersettings: 表用户设置:

+-----+----------------------+-----+
| id  |   imageCategories    | ... |
+-----+----------------------+-----+
|   1 | {"Parent Category... | ... |
|   2 | {"Parent Category... | ... |
|   3 | {"Parent Category... | ... |
| ... |                      |     |
+-----+----------------------+-----+

Table users: 表用户:

+-----+----------------------+---------+--------+
| id  |   username           | cluster | server |
+-----+----------------------+---------+--------+
|   1 | johndoe              |       1 |      1 |
|   2 | katedoe              |       1 |      1 |
|   3 | ellendoe             |       1 |      1 |
| ... |                      |         |        |
+-----+----------------------+---------+--------+

Table descriptions_0001: 表说明_0001:

+-----+---------+---------------+-----+
| id  |  title  | descriptions  | ... |
+-----+---------+---------------+-----+
|  11 | Title 1 | Description 1 | ... |
|  56 | Title 2 | Description 2 | ... |
|  78 | Title 3 | Description 3 | ... |
| ... |         |               |     |
+-----+---------+---------------+-----+

There is an equal row for every usersettings entry in users with matching ids. 具有匹配ID的用户中的每个usersettings条目都有一个相等的行。 So their username etc. can always referenced from usersettings by knowing its own id number. 因此,他们的用户名等可以始终通过知道其ID号从用户设置中引用。 Currently I only have one database but in an attempt to future proof it to some degree I store descriptions in a table with an index in its name and each user has a cluster number value as well as a server number value. 目前,我只有一个数据库,但是为了将来在某种程度上证明它,我将描述存储在一个表中,该表的名称带有索引,并且每个用户都有一个集群号值和一个服务器号值。 Each user has, on average, about 100 descriptions row so this is coming to 20,000 rows at the moment. 每个用户平均大约有100条描述行,因此目前达到20,000行。 When this is creating a bottleneck I'll start a descriptions table 0002 , and later a second server should it be needed. 当这造成瓶颈时,我将启动描述表0002 ,稍后再需要第二台服务器。 Perhaps I'm naive in my workflow but it seems like it should help. 也许我在工作流程中很幼稚,但似乎应该有所帮助。


Summary: 摘要:

So in summary, should I adapt my categories array to store a string description for child categories by: 因此,总而言之,我应该通过以下方式调整我的类别数组以存储子类别的字符串描述:

  1. Making the child categories key have an array value rather than the current string value that contains the current string value and an additional string description. 使子类别键具有数组值,而不是包含当前字符串值和附加字符串描述的当前字符串值。

  2. Like 1 but make the string description an id number that references a string on a new table 类似于1,但是将字符串描述设为一个ID号,该ID号引用新表上的字符串

  3. Look at not using a json encoded array at all and move the entire category structure into its own table 看一下根本不使用json编码的数组,并将整个类别结构移到其自己的表中

  4. Create a table for parent categories, one for child categories and one for the csv contents. 为父类别创建一个表,为子类别创建一个表,为csv内容创建一个表。 Include a description column (per the conundrum above) and an order column (essential, per the design requirements above) in each - or is there a better method of storing order than retrieving and updating the order column for each relevant row when the table will contain unique category information for multiple users? 在每列中都包含一个描述列(根据上面的难题)和一个订单列(根据上述设计要求是必需的),或者有比表中的每个相关行检索和更新订单列更好的存储订单的方法。包含多个用户的唯一类别信息? It sounds like it may require a lot of overhead. 听起来可能需要很多开销。

I ended up going for a solution somewhat similar to (4). 我最终寻求了一种类似于(4)的解决方案。 I also better appreciate the importance of describing the design requirements now as what led me to this decision was the realisation that it was more efficient in processing (I believe?) and simpler to comprehend working with select levels of a hierarchy at a time. 我也更好地理解了现在描述设计需求的重要性,因为促使我做出这个决定的是认识到它在处理上效率更高(我相信吗?),并且更容易一次理解层次结构的选定级别。

For example, If I'm dealing with all descriptions under parent category 2, child category 1, I just fetch or insert all descriptions in a description table with a shared identifier, rather than dealing with a multidimensional array that contains all hierarchies. 例如,如果我要处理父类别2,子类别1下的所有描述,那么我只是使用共享标识符在描述表中获取或插入所有描述,而不是处理包含所有层次结构的多维数组。 The latter made organising users in the db easier but the categorisation was becoming large enough that I decided it did warrant separate tables for each level of the hierarchy. 后者使在db中组织用户更加容易,但是分类变得足够大,以至于我决定为分层结构的每个级别保证单独的表。 There's enough situations where I'm working with only an isolated level of the categorisation hierarchy that putting the entire categorisation into a single md array felt like the poorer choice. 在很多情况下,我仅使用隔离级别的分类层次结构,因此将整个分类放入单个md数组似乎是较差的选择。

In terms of overhead difference, I'm unsure for now. 关于开销差异,我现在不确定。 There's less sorting of arrays happening in php to isolate data I need but there's far more calls to the db. 在php中发生的数组排序较少,可以隔离我需要的数据,但对数据库的调用却更多。

My hesitation in understanding the design requirements (and still not giving a thorough answer on this) is that I'm new to large user databases and am not good at forecasting the needs. 我在理解设计要求时犹豫不决(并且仍然没有给出完整的答案)是,我是大型用户数据库的新手,并且不擅长预测需求。 I'm designing it in such a way that it feels scalable to me and so, again, the table for each level of the hierarchy feels the least cumbersome (after the cumbersome set up - I'm currently redoing tonnes of code to make functions work with the new set up) and more scaleable as needs change. 我以一种对我来说可扩展的方式进行设计,因此,同样,层次结构的每个级别的表都变得最不麻烦(设置繁琐之后,我目前正在重做大量代码以实现功能)使用新的设置),并且可以根据需求的变化进行扩展。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM