简体   繁体   English

SQL多表和多列选择

[英]SQL Multi Table and Multi Column Select

I'm making a mysql database that has one table for each student in a school, and in each table it then has the timetable of each student. 我正在创建一个mysql数据库,该数据库为学校中的每个学生提供一个表,然后在每个表中都有每个学生的时间表。 I need to be able to run a script that will search every table in the database and every column for 2 values. 我需要能够运行一个脚本,该脚本将搜索数据库中的每个表和每个列中的2个值。 For example, it needs to search all tables and columns for teacher "x" where day_week = MondayA. 例如,它需要在所有表和列中搜索教师“ x”,其中day_week = MondayA。 In the table, there are 11 columns total, one for the day_week then 5 for period lesson (so period 1 lesson, period 2 lesson ect) then another 5 for the teacher they have for each period. 在该表中,共有11列,一列是day_week,然后是五节,这是每节课的时间(因此,第一节课,第二节课等),然后是另外五列。

Any help would be much appreciated. 任何帮助将非常感激。

Thanks. 谢谢。

First, it's worth noting this is probably not the best approach. 首先,值得注意的是,这可能不是最佳方法。 A table per student sounds like a bad idea. 每个学生一张桌子听起来是个坏主意。 You are going to be generating massive amounts of dynamic queries and not able to leverage indexing, so performance will suffer. 您将生成大量的动态查询,并且无法利用索引,因此性能会受到影响。 I would highly recommend finding an approach to get the tables into one table and time series into a join table. 我强烈建议您找到一种将表放入一个表中并将时间序列放入联接表中的方法。 Or look at a noSQL (non-relational approach). 或看看noSQL(非关系方法)。 A document database seems like it might be a fit here. 文档数据库似乎适合此处。

That said, to answer your question: You need to query the schema (information_schema tables) for lists of tables and columns and then loop through querying the tables. 就是说,要回答您的问题:您需要在架构(information_schema表)中查询表和列的列表,然后循环查询表。

Start with the mysql docs here on information_schema info_schema上mysql文档开始

Fix your schema 修正您的架构

First of all, your schema sounds very bad. 首先,您的架构听起来很糟糕。 Every time you add a new student, you have to change it (add a new table), and if this were for a real school, that would be an absolute disaster! 每次添加新学生时,都必须更改它(添加新表),如果这是一所真正的学校,那绝对是一场灾难! Changing the schema is more expensive than simply inserting a row into a table, and if your web application can directly change the database, then any security exploits that might be exposed could potentially lead to people messing with your tables without you realizing it. 更改架构比仅在表中插入一行要昂贵得多,并且如果您的Web应用程序可以直接更改数据库,那么任何可能暴露的安全漏洞都可能导致人们在不认识表的情况下弄乱您的表。

On top of that, it makes querying, say, the number of students an absolute pain . 最重要的是,它使查询学生的人数成为绝对的痛苦 Ideally, your data should be laid out in a way that lets you answer any and all questions you might ever have for it. 理想情况下,数据的布局应使您能够回答可能遇到的所有问题。 Not just questions you have now, but further down the road. 不仅您现在有疑问,而且还有进一步的疑问。

And if that's not bad enough, it makes querying a nightmare. 如果这还不够糟糕的话,它会使查询成为噩梦。 You have to keep track of the number of tables somehow, and their names, so that every time you query information it's running an entirely different query. 您必须以某种方式跟踪表的数量及其名称,以便每次查询信息时它都在运行完全不同的查询。 Some queries, like 'List students that joined in the last year', grow in size, complexity, and time to run as the list of students (the number of tables) grows. 某些查询(例如“列出去年加入的学生列表”)随着学生列表(表数)的增长而增加了规模,复杂度和运行时间。 This may be what you're running into already, though it's hard to tell simply from your question. 这可能是您已经遇到的问题,尽管很难从您的问题中分辨出来。

Normalization 正常化

Normalization is, put simply, 'Designing the schema well'. 简而言之,规范化是“精心设计架构”。 It's a bit of a vague topic, but it's broken down into varying levels; 这是一个模糊的话题,但是它分为不同的层次。 and each level depends on the last. 每个级别都取决于最后一个级别。

To be perfectly honest, I don't understand the wording of the different levels, and I'm a little bit of a newb at databases myself, but here is the gist of normalization, from what I've been taught: 老实说,我不理解不同级别的措辞,而且我本人还是数据库的新手,但是根据我的教导,这是归一化的要旨:

Every value means one, small, simple thing 每个值都意味着一件小而简单的事情

Basically, don't go crazy and put a bunch of stuff in a single column. 基本上,不要发疯,把很多东西放在一栏中。 It's bad design to have a column like, ' Categories ', and the value be a long string that reads like, "Programming, Databases, Web Development, MySQL, Cows" . 有一列类似' Categories '的值是不好的设计,其值必须是一个长字符串,其内容类似于"Programming, Databases, Web Development, MySQL, Cows"

First of all, parsing strings is time consuming, especially the longer they are, and second of all, if those categories are associated with anything else - like, perhaps you have a table of categories for people to choose from - then now you're checking larger strings for the contents of smaller strings. 首先,解析字符串非常耗时,尤其是字符串的时间更长;其次,如果这些类别与其他任何内容相关联-例如,也许您有一个供人们选择的类别表-那么现在您检查较大字符串中较小字符串的内容。 If you want to pull up every item of a certain category, you will be matching that string against the ENTIRE database... Which can be excruciatingly slow. 如果您想提取某个类别的每个项目,则需要将该字符串与ENTIRE数据库进行匹配……这可能会非常慢。

I'm not sure if this is part of normalization, but what I've learned to do is to make a numeric 'ID' for everything I refer to in more than one table. 我不确定这是否是规范化的一部分,但是我学会了做的是为多个表中引用的所有内容创建一个数字“ ID”。 For example, instead of a database table that has the columns 'Name', 'Address', 'Birthday' , I'll have, 'ID', 'Name', 'Address', 'Birthday' . 例如,我将使用'ID', 'Name', 'Address', 'Birthday'代替具有'Name', 'Address', 'Birthday'列的数据库表。 ID would be a unique number for every row, a primary key, and if at any time I wanted to refer to ANY of the people in it, I'd just use that number. ID将是每一行的唯一数字,主键,并且如果在任何时候我想引用其中的任何人,我只会使用该数字。

Numbers are much quicker to compare/match, much quicker to look up, and overall much nicer for the database to deal with, and let you create queries that run at very tiny fractions of the amount of time as with a string-based database. 数字可以更快地进行比较/匹配,查找和查找,并且整体上对数据库的处理要好得多,并且您可以创建查询,而这些查询的运行时间与基于字符串的数据库相比只占很小的一部分。

To complete the example, you could have three tables; 为了完成该示例,您可以有三个表; say, ' Articles ', ' Categories ', and ' Article_Categories '. 例如“ Articles ”,“ Categories ”和“ Article_Categories ”。

' Articles ' would hold all the actual articles and their properties. Articles ”将保留所有实际的文章及其属性。 Something like, 'ID', 'Title', 'Content' . 诸如'ID', 'Title', 'Content'

' Categories ' would hold all of the individual categories available, with ' ID ' and ' Category ' fields. Categories ”将保存所有可用的单个类别,并带有“ ID ”和“ Category ”字段。

' Article_Categories ' would hold the combinations of articles to categories; Article_Categories ”将按类别组合文章; a unique combination of ' Article_ID ' and ' Category_ID '. “商品Article_ID ”和“ Category_ID Article_ID ”的唯一组合。

What this might look like: 这可能是什么样的:

  • Articles 文章
    • 1, 'Web Cow Geniuses', 'Cows have been shown to know how to create great databases for websites using MySQL.'; 1,“ Web Cow Geniuses”,“ Cow已被证明知道如何使用MySQL为网站创建出色的数据库”。
    • 2, 'Why to use MySQL', "It's free, duh!"; 2,“为什么要使用MySQL”,“它是免费的,du!”
  • Categories 分类目录
    • 1, Cows; 1,奶牛;
    • 2, Databases; 2,数据库;
    • 3, MySQL; 3,MySQL;
    • 4, Programming; 4,编程;
    • 5, Web Development; 5,网站开发;
  • Article_Categories 文章_类别
    • 1, 1; 1,1;
    • 1, 2; 1、2;
    • 1, 3; 1、3;
    • 1, 4; 1、4;
    • 1, 5; 1、5;
    • 2, 2; 2 2
    • 2, 3; 2 3

Notice that each combination in ' Article_Categories ' is unique; 注意“ Article_Categories ”中的每个组合都是唯一的; you never see, for example, '1, 3' twice. 例如,您永远不会看到两次“ 1、3”。 But '1' is in the first column multiple times, and '3' is in the second column multiple times. 但是“ 1”多次出现在第一列中,而“ 3”多次出现在第二列中。

This is called a 'many to many' table. 这称为“多对多”表。 You use it when you have a relationship between two data sets, where there are multiple combinations for mixing them. 当两个数据集之间存在关系时,可以使用它,其中有多种组合可以将它们混合使用。 Essentially, where any number of items in one can correspond to any number of items from the other. 本质上,其中一个中任意数量的项目可以对应于另一个中任意数量的项目。

Do not mix data and metadata 不要混合数据和元数据

Basically, data is the content of the tables. 基本上,数据是表的内容。 The values inside the rows. 行内的值。 Metadata is the tables themselves; 元数据就是表本身; the table names, the value types, and the relationships between two different sets of data. 表名称,值类型以及两组不同数据之间的关系。

Metadata inside data 数据内部的元数据

Here's an example of putting metadata inside data: 这是将元数据放入数据中的示例:

  • A ' People ' table that has, as columns, ' isStudent ' and ' isTeacher '. 一个“ People ”表,其中具有“ isStudent ”和“ isTeacher ”作为列。

When data is put in ' People ', you might have a row where they are both a teacher and a student, so you put something like 'ID', 'Name', 'yes', 'yes' . 将数据放入“ People ”时,您可能在一行中既是教师又是学生,因此您要输入诸如'ID', 'Name', 'yes', 'yes' This doesn't sound bad, and there may well be a teacher who's taking classes at the same school so it is possible. 这听起来不错,而且很可能有一位老师在同一所学校上课,所以这是可能的。

However, it takes up more space since you have to have a value of some sort in both columns, even if they are only one or the other. 但是,由于您必须在两列中都具有某种值,即使它们只是一个或另一个,也要占用更多空间。

A better way to make this would be to split it out into three separate tables: 更好的方法是将其分为三个单独的表:

  • A ' People ' table that has an ID, name, and other data that every person has. 一个“ People ”表,其中包含每个人的ID,姓名和其他数据。
  • A ' Students ' table that uses only the values of the ' People.ID ' as data. 一个“ Students ”表,仅使用“ People.ID ”的值作为数据。
  • A ' Teachers ' table that uses only the values of the ' People.ID ' as data. 一个“ Teachers ”表,仅使用“ People.ID ”的值作为数据。

This way, everybody who is a student gets referenced to in ' Students ', and everyone who's a teacher gets referenced in ' Teachers '. 这样,在“ Students ”中引用了每个Students ,而在“ Teachers ”中引用了每个Teachers As mentioned previously, we use the ' ID ' field because it's quicker to match up across tables. 如前所述,我们使用“ ID ”字段,因为它可以更快地跨表匹配。 Now, there are only as many Teachers referenced as there need to be, and the same goes for Students. 现在,所引用的老师数量与所需的数量一样多,对学生而言也是如此。 This initially takes up more space due to the size overhead of having them as separate tables, but as the database grows, this is more than made up for. 由于将它们作为单独的表的大小开销,最初占用了更多的空间,但是随着数据库的增长,这已经远远超过了弥补。

This also allows you to reference teachers directly. 这也使您可以直接推荐老师。 Say you have a table of ' Classes ', and you only want Teachers capable of being the, well, Teacher. 假设您有一张“ Classes ”表,并且只希望有能力成为老师的老师。 Your ' Classes ' table, in the ' Teachers ' column, can have a foreign key to ' Teachers.ID '. 您的“ Classes ”表在“ Teachers ”列中,可以有一个指向“ Teachers.ID ”的外键。 That way, if a Student hacks the database and tries to put themselves as teaching a class somehow, it's impossible for them to do so. 这样,如果学生入侵数据库并试图以某种方式将自己摆在课堂教学中,那么他们就不可能这样做。

Data inside metadata 元数据中的数据

This is quite similar to what you appear to be having problems with. 这与您似乎遇到的问题非常相似。

Data is, essentially, what it is we are trying to store. 从本质上讲,数据就是我们要存储的数据。 Student names, teacher names, schedules for both, etc. However, sometimes we put data - like a student's name - inside of metadata - like the name of a table. 学生姓名,老师姓名,两者的时间表等。但是,有时我们将数据-如学生姓名-放在元数据中-如表名。

Whenever you see yourself regularly adding onto or changing the schema of a database, it is a HUGE sign that you are putting data inside of metadata. 每当您看到自己定期添加或更改数据库架构时,这都是巨大的迹象,表明您正在将数据放入元数据中。 In your case, every student having their own table is essentially putting their name in the metadata. 在您的情况下,每个拥有自己表的学生实际上都是将其姓名放在元数据中。

Now, there are times where you kinda want to do this, when the number of tables will not change THAT often. 现在,有些时候您想这样做,但是表的数量不会经常改变。 It can make things simpler.. For example, if you have a website selling underwear, you might have both ' Mens_Products ' and ' Womens_Products ' tables. 它可以使事情变得更简单。例如,如果您有一个销售内衣的网站,则可能同时具有“ Mens_Products ”和“ Womens_Products ”表。 Obviously the 'neater' solution would be to have a ' Product_Categories ' table, in case you want to add transgender products or other sell products to both genders, but in this case it doesn't matter that much. 显然,“整洁”的解决办法是有一个“ Product_Categories ”表中,如果你想变性的产品或其他销售产品添加两种性别,但在这种情况下,它不管那么多 It wouldn't be hard to add a ' Trans_Products ' table, and it's not like you'd be adding new tables frequently. 添加“ Trans_Products ”表并不难,这不像您经常添加新表那样。

Do not duplicate data 不重复数据

At first, this'll sound like I'm contradicting EVERYTHING I've just said. 起初,这听起来像我在与我刚才所说的一切相矛盾。 "How am I supposed to copy those IDs everywhere if I'm not supposed to duplicate data?!" “如果不应该复制数据,我应该如何在所有地方复制这些ID ?!” But alas, that's not exactly what I mean. 但是,a,这并不是我的意思。 In fact, this is another reason for having a separate ID for each item you might refer to! 实际上,这是您可能要引用的每个项目都具有单独ID的另一个原因!

Essentially, you don't want to have to update more data than you need to. 本质上,您不需要更新比所需更多的数据。 If, for example, you had a ' Birthday ' column in your ' Students ' and your ' Teachers ' tables in the above example, and you had someone who was both a Student and a Teacher, suddenly their birthday is recorded in two different spots! 例如,如果在上例中的“ Students ”和“ Teachers ”表中都有“ Birthday ”列,并且您既有学生又有教师,则突然将他们的生日记录在两个不同的位置! Now, what if the birthday was wrong, and you wanted to change it? 现在,如果生日不对,您想更改生日怎么办? You'd have to change it twice ! 您必须将其更改两次

So instead, you put it in your ' People ' table. 因此,您将其放在“ People ”表中。 That way, for each person, it only exists once. 这样,对于每个人来说,它只存在一次。

This might seem like an obvious example, but you'd be surprised at how often it can occur by accident. 这似乎是一个显而易见的例子,但是您会意外地发现它经常发生。 Just be careful, and watch for anything that requires you to update the same value in two different locations. 请小心,并注意任何需要您在两个不同位置更新相同值的事情。

Queries 查询

So, with all that out of the way, how should you query? 那么,尽管如此,您应该如何查询? What sort of SELECT statement should you use? 您应该使用哪种SELECT语句?

Lets say you have the following schema (primary key in bold): 假设您具有以下架构(主键以粗体显示):

  • People: 人:
    • ID ID
    • Name (Unique) 名称(唯一)
    • Birthday 生日
  • Teachers: 教师:
    • People_ID (Foreign: People.ID) People_ID (外国人:People.ID)
  • Students: 学生们:
    • People_ID (Foreign: People.ID) People_ID (外国人:People.ID)
  • Classes: 类:
    • ID ID
    • Name (Unique) 名称(唯一)
    • Teacher_ID (Foreign: Teachers.ID) Teacher_ID(外国人:Teachers.ID)
  • Class_Times: 上课时间:
    • Class_ID (Foreign: Classes.ID) Class_ID (国外:Classes.ID)
    • Day (Enum: 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday') 日期 (枚举:“星期一”,“星期二”,“星期三”,“星期四”,“星期五”,“星期六”)
    • Start_Time 开始时间
  • Student_Classes: 学生班:
    • Student_ID (Foreign: Students.ID) Student_ID (外国人:Students.ID)
    • Class_ID (Foreign: Classes.ID) Class_ID (国外:Classes.ID)

First note that ' Student_Classes ' has two primary keys... This makes the combination of the two unique, not the individual ones. 首先请注意,“ Student_Classes ”具有两个主键...这使两个键(而不是单个键) 组合在一起 This makes it a many-to-many table, as discussed earlier. 如前所述,这使其成为一个多对多表。 I did this also for ' Class_ID ' and ' Day ' so that you wouldn't put the class twice on the same day. 我也是针对“ Class_ID ”和“ Day ”这样做的,这样您就不会在同一天两次上课。

Also, it may be bad that we use an Enum for the days of the week... If we wanted to add Sunday classes, we'd have to change it, which is a change in the schema, which could potentially break things. 同样,在一周中的某天使用Enum可能很糟糕...如果要添加Sunday类,则必须进行更改,这是架构中的更改,可能破坏事情。 However, I didn't feel like adding a 'Days' table and all that. 但是,我不想添加“ Days”表和所有其他内容。

At any rate, if you wanted to find all of the teachers who were teaching on a Monday, you could just do this: 无论如何,如果您想找到星期一正在教书的所有老师,则可以这样做:

SELECT
    People.Name
FROM
    People
    LEFT JOIN
        Teachers
        ON
            People.ID = Teachers.People_ID
    LEFT JOIN
        Classes
        ON
            People.ID = Classes.Teacher_ID
    LEFT JOIN
        Class_Times:
        ON
            Classes.ID = Class_Times.Class_ID
WHERE
    Class_Times.Day = 'Monday';

Or, formatted in one big long string (like it'll be when you put it in your other programming langauge): 或者,将其格式化为一个大的长字符串(就像将其放入其他编程语言一样):

SELECT People.Name FROM People LEFT JOIN Teachers ON People.ID = Teachers.People_ID LEFT JOIN Classes ON People.ID = Classes.Teacher_ID LEFT JOIN Class_Times: ON Classes.ID = Class_Times.Class_ID WHERE Class_Times.Day = 'Monday';

Essentially, here is what we do: 本质上,这是我们的工作:

  1. Select the main thing we want, the teacher's name. 选择我们想要的主要内容,老师的名字。 The name is stored in the ' People ' table, so we select from that first. 名称存储在“ People ”表中,因此我们首先从中选择。
  2. We then left join it to the ' Teachers ' table, telling it that all of the People we select must be a Teacher. 然后,我们将其加入“ Teachers ”表,告诉我们我们选择的所有人员都必须是一名教师。
  3. After that, we do the same with ' Classes '; 之后,我们对“ Classes ”进行相同的操作; narrowing it down to only Classes that the Teacher actually teaches themselves. 将其范围缩小到仅教师实际自学的班级。
  4. Then we also grab ' Class_Times ' (important for the final step), but only for those Classes that the Teacher is teaching. 然后,我们还抓取“ Class_Times ”(对于最后一步很重要),但仅适用于教师正在教授的那些班级。
  5. Finally, we specify that the Day the Class takes place must be a 'Monday'. 最后,我们指定上课的日期必须为“星期一”。

You need to create one table for students and one for timetable and have foreign key of student in timetable. 您需要为学生创建一个表,为时间表创建一个表,并在时间表中具有学生的外键。 Use best practices, consider you have 1000 students, you will end up creating 1000 tables while database is there is make life easier. 使用最佳实践,假设您有1000名学生,那么当数据库存在时,您最终将创建1000个表,这会使生活变得更轻松。 Create one table, add as many entries as you want. 创建一个表,添加任意数量的条目。

Secondly, ask your question more clearly using this structure so we may be able to help you 其次,使用此结构更清楚地提出您的问题,以便我们可能为您提供帮助

Table 1: Student: id firstName lastName 表1:学生: id firstName lastName

Table 2: Schedule: studentID day period classID 表2:时间表: studentID日时段 classID

studentID(relates to Student.id) studentID(与Student.id相关)

classID(relates to Classes.id) classID(与Classes.id相关)

Table 3: Classes: id className teacherName 表3:类: id className TeacherName

BOLD is primary key BOLD是主键

This will gather all students that have that teacher: 这将聚集所有拥有该老师的学生:

Select S1.firstName, S1.lastName, C.teacherName from Student as S1 join Schedule as S2 join Classes as C where S1.id = S2.studentID and S2.classID = C.id and C.teacherName = XXXX

This will gather all students that are in a certain class: 这将收集某个班级中的所有学生:

Select S1.firstName, S1.lastName from Student as S1 join Schedule as S2 where S1.id = S2.studentID and S2.classID = XXXX

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM