A friend recently got burned
when a report he had written “suddenly” started coming out wrong
in production. The data was good, but the report was no longer in the
original sequence. He came to ask me if I had any idea what it could
be and without looking at a line of his code I told him he did not
have an Order By clause. He started explaining why it should still
come out in order and I stopped him and said, “Whenever you are
expecting an order from SQL, specify it explicitly.” He
started explaining that the report had been working and he thought it
was related to a SQL upgrade, not his code. I said again “Whenever
you are expecting an order from SQL, specify it explicitly.” I see
many SQL developers who find
it hard to really, really swallow this. And I see
many SQL developers burned when “something changes”.
Think
of the SQL statement as a contract between you and the
database engine. If the engine accepts your statement the contract
specifies
what is expected of the result. Everything specified in the statement
must be adhered to. And everything that is not specified
is up for grabs. Put another way for this situation, if you don't
have an Order By in your statement the database is engine is
completely free to deliver the results to you in any arbitrary order
it chooses. The reason it will choose for ordering are its own and it
does not care about what you might think would be an obvious order.
Where
I think developers get screwed up is that sometimes it does seem as
though you can in fact predict the order of a result set without
having an Order By. For example, if you do a Group By on FieldA it
“makes sense” that the grouped results come back in ascending
FieldA order. If you select from the database with an Order By into
a temporary table, then select from that temporary table without an
Order By,
it “makes sense” that the second result set comes
back in the same order as the original result set. If you select from
a small table with no Where clause it “makes sense” that the
results come back in primary key sequence.
In
all the examples above the outcome probably matches the expected and
this leads you to believe that it works. But in all
those cases there is no
“contract” that you've made with SQL for that. It can change
it's mind at any time and it's not “wrong” in doing so, it's
not
“acting
weird”.
Think
of it from the viewpoint of the database engine. You've made a
request. In
that request, you've specified certain requirements. Those
requirements it has to honor. But now anything you have not
specified, it is free to do as it wishes. “As it wishes”
generally translates to however it can do it as fast as possible and
with as few resources as possible.
Some of the things
I've seen change the order of a result set unexpectedly over the
years.
- Changing indexes on a table.
- Updating statistics in SQL Server.
- DB tuning changes.
- Changes in table partition structure.
- DB updates.
Why?
Well for instance, if a result set was sequenced in the order of an
index used for access and a new index was introduced,
the engine may decide that new index is more advantageous
and now your results come in the native order of the new index. If
the engine needs to store results in a temporary table as part of its
execution plan, it may find it advantageous to store the entries in a
certain structure and a DB update may change that algorithm.
The why's are
fascinating and worth reviewing with a DBA. But consider that an
exercise in learning. You don't want to do this study in production
with angry users. The rule is simple. If you’re expecting a
specific sequence to your result set, always specify the Order By.

No comments:
Post a Comment