Monday, November 25, 2013

Disorderly Conduct



A friend recently got burned when a report he had written “suddenly” started coming out wrong in production. The data was good, but the report was no longer in the original sequence. He came to ask me if I had any idea what it could be and without looking at a line of his code I told him he did not have an Order By clause. He started explaining why it should still come out in order and I stopped him and said, “Whenever you are expecting an order from SQL, specify it explicitly.” He started explaining that the report had been working and he thought it was related to a SQL upgrade, not his code. I said again “Whenever you are expecting an order from SQL, specify it explicitly.” I see many SQL developers who find it hard to really, really swallow this. And I see many SQL developers burned when “something changes”.

Think of the SQL statement as a contract between you and the database engine. If the engine accepts your statement the contract specifies what is expected of the result. Everything specified in the statement must be adhered to. And everything that is not specified is up for grabs. Put another way for this situation, if you don't have an Order By in your statement the database is engine is completely free to deliver the results to you in any arbitrary order it chooses. The reason it will choose for ordering are its own and it does not care about what you might think would be an obvious order.

Where I think developers get screwed up is that sometimes it does seem as though you can in fact predict the order of a result set without having an Order By. For example, if you do a Group By on FieldA it “makes sense” that the grouped results come back in ascending FieldA order. If you select from the database with an Order By into a temporary table, then select from that temporary table without an Order By, it “makes sense” that the second result set comes back in the same order as the original result set. If you select from a small table with no Where clause it “makes sense” that the results come back in primary key sequence.

In all the examples above the outcome probably matches the expected and this leads you to believe that it works. But in all those cases there is no “contract” that you've made with SQL for that. It can change it's mind at any time and it's not “wrong” in doing so, it's not “acting weird”.

Think of it from the viewpoint of the database engine. You've made a request. In that request, you've specified certain requirements. Those requirements it has to honor. But now anything you have not specified, it is free to do as it wishes. “As it wishes” generally translates to however it can do it as fast as possible and with as few resources as possible.

Some of the things I've seen change the order of a result set unexpectedly over the years.
  1. Changing indexes on a table.
  2. Updating statistics in SQL Server.
  3. DB tuning changes.
  4. Changes in table partition structure.
  5. DB updates.
Why? Well for instance, if a result set was sequenced in the order of an index used for access and a new index was introduced, the engine may decide that new index is more advantageous and now your results come in the native order of the new index. If the engine needs to store results in a temporary table as part of its execution plan, it may find it advantageous to store the entries in a certain structure and a DB update may change that algorithm.
The why's are fascinating and worth reviewing with a DBA. But consider that an exercise in learning. You don't want to do this study in production with angry users. The rule is simple. If you’re expecting a specific sequence to your result set, always specify the Order By.

No comments:

Post a Comment