Skip to main content

Using LOG ERRORS as a testing mechanism

I recently heard a very interesting story about a use for LOG ERRORS: namely, to verify the correctness (and improve the performance) of a problematic SQL statement.

I thought you might enjoy hearing about it.

Greg Belliveau of Digital Motorworks buttonholed me at ODTUG's Kscope14 conference with this report:

As part of a data mart refresh, we had an insert statement that refreshed a FACT table, and over the years had gotten to the point where it took well over 3 hours to complete. 

There was a NOT IN clause in the statement, and we were pretty sure that was the cause of the degenerating performance. Why was it there? Our best guess was so that the statement would not fail in case someone ran the refresh for a date range that had already ran...even though there was code that checked and prevented that from happening. 

[Steven: In other words, programmer's insurance. "Just in case, let's do this." That's always problematic in software. Is that a real "case"? Will it ever happen? What are the consequences of including this insurance? Will we document it so that others coming along later can figure out why this weird code is here? Usually not.]

Not wanting to lose the "safety net" that was in place, we decided to try the LOG ERRORS feature you had mentioned in your session at Oracle Open World in 2010. We removed the NOT IN clause and added LOG ERRORS. The insert statement then ran (and continues to run to this day) in roughly one minute (down from three hours!).

Oh, and there's never been a single row inserted in the error table!

Nice, nice, very nice.

It's always a bit scary to mess with code that's been around for years and (since it's been around for years) is a gnarly bunch of logic. SQL, with its set orientation, can be even more off-putting than, say, a PL/SQL function.

So, of course, the solution is to build a comprehensive automated regression test script so that you can compare before and after results.

Yes, but almost no one will (ever) do that. So we do what we can. 

And in this case, Greg and his team came up with a very creative solution:

We are pretty sure NOT IN is causing the performance problem, but we also cannot afford to remove the clause and have the statement fail.

So we'll take it out, but add LOG ERRORS to ensure that all inserts are at least attempted.

Then we can check the error log table afterwards to see if the NOT IN really was excluding data that would have caused errors.

In Greg's case, the answer was "Nope". So far as they can tell, the NOT IN was unnecessary.

OK, but maybe it will be necessary a week or month or year from now. What then?

Well, the bad data that should have been excluded will still be excluded (the insert will fail), and then the job can check the log table and issue whatever kind of report (or alarm) is needed.

So LOG ERRORS as part regression test, part performance monster.

Nice work, Greg.

Nice work, Oracle.


Comments

  1. Hi Steven.
    I've recently discovered that Data Pump uses LOG ERRORS under the hood to implement the feature of "skip constraint errors" during import: http://www.db-oriented.com/2014/07/19/impdp-which-rows-failed
    It's nice that Oracle uses its own features internally.
    Thanks,
    Oren.

    ReplyDelete
  2. Hello Steven & Oren,

    Very nice, thanks to Oren for his comment :) :)

    What a small world ... to find Oren's post on a blog that we both follow ...
    and we also know each other from the last Israeli Oracle Week 2013 ...
    we only missed Steven to join us there, am I right, Oren ?

    Regarding the automatic error logging performed during IMPDP ...
    I read the post on your blog and saw the nice trick for temporarily preserving the
    error logging table ... but, I wonder, why doesn't Oracle always preserve this table
    as a default behavior ... in fact ... it is NOT of much use to fill a table with data
    and then just drop it before anyone has the chance to use that data ...

    I wonder whether other tricks are possible, like, for example,
    using a DDL event trigger to catch the creation of the ERRDP$ table,
    and then create another "shadow" table for it and a trigger to copy the data
    from ERRDP$ into the shadow table which will remain ...

    I cannot but express my hope that we all will meet one day :) :)

    Thanks a lot & Best Regards,
    Iudith

    ReplyDelete

Post a Comment

Popular posts from this blog

Quick Guide to User-Defined Types in Oracle PL/SQL

A Twitter follower recently asked for more information on user-defined types in the PL/SQL language, and I figured the best way to answer is to offer up this blog post. PL/SQL is a strongly-typed language . Before you can work with a variable or constant, it must be declared with a type (yes, PL/SQL also supports lots of implicit conversions from one type to another, but still, everything must be declared with a type). PL/SQL offers a wide array of pre-defined data types , both in the language natively (such as VARCHAR2, PLS_INTEGER, BOOLEAN, etc.) and in a variety of supplied packages (e.g., the NUMBER_TABLE collection type in the DBMS_SQL package). Data types in PL/SQL can be scalars, such as strings and numbers, or composite (consisting of one or more scalars), such as record types, collection types and object types. You can't really declare your own "user-defined" scalars, though you can define subtypes  from those scalars, which can be very helpful from the p

The differences between deterministic and result cache features

 EVERY once in a while, a developer gets in touch with a question like this: I am confused about the exact difference between deterministic and result_cache. Do they have different application use cases? I have used deterministic feature in many functions which retrieve data from some lookup tables. Is it essential to replace these 'deterministic' key words with 'result_cache'?  So I thought I'd write a post about the differences between these two features. But first, let's make sure we all understand what it means for a function to be  deterministic. From Wikipedia : In computer science, a deterministic algorithm is an algorithm which, given a particular input, will always produce the same output, with the underlying machine always passing through the same sequence of states.  Another way of putting this is that a deterministic subprogram (procedure or function) has no side-effects. If you pass a certain set of arguments for the parameters, you will always get

My two favorite APEX 5 features: Regional Display Selector and Cards

We (the over-sized development team for the PL/SQL Challenge - myself and my son, Eli) have been busy creating a new website on top of the PLCH platform (tables and packages): The Oracle Dev Gym! In a few short months (and just a part time involvement by yours truly), we have leveraged Oracle Application Express 5 to create what I think is an elegant, easy-to-use site that our users will absolutely love.  We plan to initially make the Dev Gym available only for current users of PL/SQL Challenge, so we can get feedback from our loyal user base. We will make the necessary adjustments and then offer it for general availability later this year. Anyway, more on that as the date approaches (the date being June 27, the APEX Open Mic Night at Kscope16 , where I will present it to a packed room of APEX experts). What I want to talk about today are two features of APEX that are making me so happy these days: Regional Display Selector and Cards. Regional Display Sel