简体繁体 English

数据库数据过滤最佳实践

[英]Database Data Filtering Best Practice

原文 2014-06-11 14:44:24 1 2 java/ mysql/ jdbc

I am currently using raw JDBC to query records in a MySql database; 我目前正在使用原始JDBC查询MySql数据库中的记录； each record in the subsequent Resultset is ultimately extracted, placed in a domain specific model, and stored to a List Instance. 最终结果集中的每个记录都将最终提取出来，放在特定于域的模型中，并存储到列表实例中。

My query is: in circumstances where there is a requirement to further filter that data (incidentally based on columns that exist in the SAME Table) which of the following approaches would generally be considered best practice: 我的查询是：在需要进一步过滤数据的情况下（偶然地基于SAME表中存在的列），以下哪种方法通常被认为是最佳实践：

1.The issuance of further WHERE clause calls into the database. 1.向数据库发出更多的WHERE子句调用。 This will effectively offload the filtering process to the database but obviously results in an additional query or queries where multiple filters are applied consecutively. 这将有效地将筛选过程转移到数据库，但显然会导致一个或多个附加查询（连续应用多个过滤器）。

2.Explicitly filtering the aforementioned preprocessed List at the Application level, thus negating the need to have to make additional calls into the database each time the records are filtered. 2.在应用程序级别上明确过滤上述预处理列表，因此无需每次过滤记录时都必须对数据库进行其他调用。

3.Some hybrid combination of the above two approaches, perhaps where all filtering operations are initially undertaken by the database server but THEN preprocessed to a application specific model and implicitly cached to a collection for some finite amount of time. 3.以上两种方法的某种混合组合，也许其中所有筛选操作最初都是由数据库服务器执行的，然后将其预处理为特定于应用程序的模型，然后隐式缓存到某个有限时间的集合中。 Further filter queries, received within this interval, would then be serviced from the data stored in the cache. 然后，将根据缓存中存储的数据为在此时间间隔内接收到的其他过滤器查询提供服务。

It is important to note that the Database Server in this scenario is actually located on an external machine, therefore the overhead and latency of sending query traffic over the local network also has to be factored into the approach we ultimately elect to take. 重要的是要注意，在这种情况下，数据库服务器实际上位于外部计算机上，因此在本地网络中最终选择采用的方法也必须考虑通过本地网络发送查询流量的开销和延迟。

I am patently aware of the age-old mantra that stipulates that: "The database server should be used to do what its good at." 我很清楚地知道古老的口头禅规定：“应该使用数据库服务器来做自己擅长的事情。” however in this scenario it just seems like a less than adequate solution to be making numerous calls into the database to filter data that I ALREADY HAVE at the application level. 但是，在这种情况下，似乎要向数据库中进行无数次调用来筛选我已经在应用程序级别上拥有的数据似乎还不够。

Your thoughts and insights would be greatly appreciated. 您的想法和见解将不胜感激。

2 个解决方案

I have used the hybrid approach on many applications with good results. 我已经在许多应用程序中使用了混合方法，效果很好。

Database filtering works good especially for columns that are indexed. 数据库过滤特别适用于索引的列。 This reduces network overhead since fewer rows are sent to application. 由于减少了发送到应用程序的行，因此减少了网络开销。

Database filtering can be really slow for some columns depending upon the quantity of rows in the results and the lack of indexes. 对于某些列，数据库筛选实际上可能会很慢，具体取决于结果中的行数和索引的缺乏。 The network overhead can be negligible compared to database query time so application filtering may be faster for this situation. 与数据库查询时间相比，网络开销可以忽略不计，因此在这种情况下应用程序筛选可能会更快。

I also find that application filtering in Java easier to write and understand instead of complex SQL. 我还发现，使用Java进行应用程序筛选比编写复杂的SQL更容易编写和理解。

I usually experiment manually to get the fewest rows in a reasonable time with plain SQL. 我通常手动进行实验，以在合理的时间内使用普通SQL获得最少的行。 Then write Java to refine to the desired rows. 然后编写Java以精炼到所需的行。

我首先欣赏这个问题...几天前我也面临类似的情况...您已经讨论了所有可用的选项，我更喜欢第二个选项....我的意思是在应用程序级别进行处理，而不是在数据库中进行过滤水平。