简体繁体 English

将多个数据库查询的结果合并到一个优雅的表中的逻辑或过程是什么？

[英]What is the logic or procedure for combining multiple DB query's results into one elegant table?

原文 2014-12-30 20:27:16 0 1 mysql/ sql/ database

Basically, I have three tables that contain all the data that I want, but I am having to do some crazy JOIN and WHERE statements that are not working for me. 基本上，我有三个表，其中包含我想要的所有数据，但是我必须执行一些不适用于我的疯狂的JOIN和WHERE语句。 I have finally resorted to using temporary tables, but I was wondering if there is a more long term solution. 我终于求助于使用临时表，但我想知道是否还有更长期的解决方案。

The Situation: We Pull large amounts of data via SOAP to our database, we have no control on how the data is organized, combined together, labeled and etc., we need to split it up the best we can so that it can eventually become useful to us. 情况：我们通过SOAP将大量数据提取到数据库中，我们无法控制数据的组织，组合，标记等方式，我们需要尽最大可能将其拆分，以便最终成为对我们有用。

What I am asking is how do the pro's or etc. "Prep" data so that it can either be finally inserted into a usefully table via other tables quickly, and how does it stay updated with the flow of new data coming in? 我要问的是专业人士等的“ Prep”数据如何才能最终通过其他表格快速插入有用的表格中，并且如何随着传入的新数据流而保持更新？ What is the terminology? 什么术语？ What should I research? 我应该研究什么？

Thanks in advance! 提前致谢！

1 个解决方案

The terminology I use for that is for preparing the data and getting ready for insertion is "staging" the data. 我为此使用的术语是准备数据并准备插入的是“分段”数据。 We typically insert/update rows into temporary staging tables. 我们通常将行插入/更新到临时登台表中。

We massage and tweak the data in the staging tables, assigning foreign keys, fixing malformed data, splitting big multiuse fields into individual fields, and so on, to get the data cleaned up BEFORE the rows are inserted into the actual target tables. 我们对登台表中的数据进行按摩和调整，分配外键，修复格式错误的数据，将大型的多用途字段拆分为单个字段等，以便在将行插入到实际目标表之前对数据进行清理 。

(I don't know that this is a standard terminology, others may refer to it differently.) （我不知道这是一个标准术语，其他人可能对此有所不同。）

FOLLOWUP 跟进

To improve query performance for complex data, we sometimes store pre-joined and pre-calculated results. 为了提高复杂数据的查询性能，有时我们会存储预先连接和预先计算的结果。 Basically, we populate a table with "query ready" results, to make for much simpler queries of historic data. 基本上，我们用“查询就绪”结果填充表，以使对历史数据的查询更加简单。 The big drawback to this is that we now have redundant data, which can become "out-of-sync" with the operational data. 这样做的最大缺点是我们现在拥有冗余数据，这些冗余数据可能与操作数据“不同步”。 We use a scheduled (nightly) process to re-populate these tables. 我们使用计划的（每晚）过程来重新填充这些表。

(I'm unsure of the industry standard jargon for these types of tables.) （我不确定这些类型的表的行业标准术语。）

In terms of researching this, these approaches are going to be described in articles/white papers on "data warehouse" and "data mart". 在研究方面，这些方法将在“数据仓库”和“数据集市”的文章/白皮书中进行描述。 And that's almost always described as " ETL " The three major steps: Extract - Transform - Load . 而且几乎总是将其描述为“ ETL ”这三个主要步骤： 提取-转换-加载 。 There's also a lot of noise in the industry press about "data mining" and "big data". 在行业媒体中，关于“数据挖掘”和“大数据”的声音也很多。