简体   繁体   中英

Alternative to SQL limitation of result set

We query a relational database using standardized SQL. The result of a query is a two dimensional table; rows and columns.

I really like the well structure of a rdms (i honestly never worked professionally with other db systems). But the query language or more exactly the result set sql produces is quite a limitation affecting performance in general.

Let's create a simple example: Customer - Order (1 - n) I want to query all customers starting with letter "A" having an order this year and display each with all his/her orders.

I have two options to query this data.

Option 1 Load data with a single query with a join between both tables. Downside: The result which is transferred to the client, contains duplicated customer data which represents an overhead.

Option 2 Query the customers and start a second query to load their orders. Downsides: 2 queries which result in twice the network latency, the where in term of the second query can potentially be very big, which could lead to query length limitation violation, performance is not optimal because both queries peform a join/filtering to/of orders

There would be of course an option three where we start query with the orders table.

So generally there exists the problem that we have to estimate based on the specific situation what the better trade is. Single query with data overhead or multiple queries with worse execution time. Both strategies can be bad in complex situations where a lot of data in well normalized form has to be queries.

So ideally SQL would be able to specify the result of a query in form of an object structure. Imagine the result of the query would be structured as xml or json instead of a table. If you ever worked with an ORM like EntityFramework you maybe know the "Include" command. With support of an "include" like command in sql and returning the result not as join but structured like an object, world would be a better place. Another scenario would be an include like query but without duplicates. So basically two tables in one result. To visualize it results could look like:

{
  { customer 1 { order 1 order 2} }
  { customer 2 { order 3 order 4} }
} or
{
  { customer1, customer2 }
  { order1, order2, order3, order4 }
}

MS SQL Server has a feature "Multiple Result Sets" which i think comes quite close. But it is not part of Standard SQL. Also i am unsure about ORM Mappers really using such feature. And i assume it is still two queries executed (but one client to server request). Instead of something like "select customers include orders From customers join orders where customers starts with 'A' and orders..."

Do you generally face the same problem? How do you solve it if so? Do you know a database query language which can do that maybe even with existing ORM Mapper supporting that (probably not)? I have no real working experience with other database systems, but i don't think that all the new database systems address this problem? (but other problems of course) What is interesting is that in graph databases joins are basically free as far as i understand.

I think you can alter your application workflow to solve this issue. New application workflow:

  1. Query the Customer table which customer start with a letter 'A'. Send the result to client for display.
  2. User select a customer from client and send back the customer id to server
  3. Query the Order table by the customer id and send the result to client for display.

There is a possibility to return json on some SQL-Server. If you have a table A relate to table B and every entry on table point to maximum one entry at table A then you can reduce overload on traffic as you described. On example could be an address and their contacts.

SELECT * FROM Address
JOIN Contact ON Address.AddressId = Contact.AddressId
FOR JSON AUTO

The SQL return result would be smaller:

"AddressId": "3396B2F8",
"Contact": [{
        "ContactId": "05E41746",
        ... some other information
    }, {
        "ContactId": "025417A5",
        ... some other information
    }, {
        "ContactId": "15E417D5",
        ... some other information
    }
    }
]

But actually, I don't know any ORM which process JSON for traffic reduction. If you had some contacts for different addresses it could be counterproductive.

Don't forget that JSON also has some overhand and it need to be serialized and deserialized

The optimum for traffic reduction would be if the SQL-Server split the joined result in Multiple Result Sets and the client respectively the Object-Relational-Mapper map them together. I'm would be interested if you find a solution for your problem.

Another train of thought would be to use a graph database.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM