Tuesday, May 9, 2017

Use records to improve readability and flexibility of your code

Suppose I've created a table to keep track of hominids:

   hominid_name     VARCHAR2 (100),
   home_territory   VARCHAR2 (100),
   brain_size_cm    INTEGER

I might then write code like this:

   l_b_hominid_name    VARCHAR2 (100) := 'Bonobo';
   l_b_brain_size_cm   INTEGER := 500;
   l_g_hominid_name    VARCHAR2 (100) := 'Gorilla';
   l_g_brain_size_cm   INTEGER := 750;
   l_n_hominid_name    VARCHAR2 (100) := 'Neanderthal';
   l_n_brain_size_cm   INTEGER := 1800;

What do you think?

I find the little voice of Relational Theory inside my head rebelling.

"All that repetition! All that denormalization! All that typing (or copy-pasting, which is even worse)!"

Surely if I should avoid having redundant data in rows of my tables, I should avoid redundant code, too?

Yes, I should.  I don't like to see long lists of declarations, especially when the names are very similar and follow a pattern. 

A fine way to avoid this kind of code is to use record types to group related variables together within a named context: the record variable. So I could rewrite the declaration section above to:

   l_bonobo        hominids%ROWTYPE;
   l_gorilla       hominids%ROWTYPE;
   l_neanderthal   hominids%ROWTYPE;
   l_bonobo.hominid_name := 'Bonobo';
   l_bonobo.brain_size_cm := 500;
   l_gorilla.hominid_name := 'Gorilla';
   l_gorilla.brain_size_cm := 750;
   l_neanderthal.hominid_name := 'Neanderthal';
   l_neanderthal.brain_size_cm := 1800;

Notice that I now move the initializations of the variable (well, record.field) values to the executable section. That's because PL/SQL does not yet offer a built-in function (in object-oriented lingo, a constructor method) for record types.

So I no longer have six declarations - just three. And, of course, if my table had 15 columns and I had declared a separate variable for each of those, I would have been able to shrink down my declarations from 45 to 3!

Still, I don't like putting all that initialization code in the main body of my block. How about if I create my own "record constructor" function, and then call that:

   name_in IN hominids.hominid_name%TYPE,
   home_territory_in IN hominids.home_territory%TYPE,
   brain_size_cm_in IN hominids.brain_size_cm%TYPE)
   RETURN hominids%ROWTYPE
   l_return hominids%ROWTYPE;
   l_return.hominid_name := name_in;
   l_return.home_territory := home_territory_in;
   l_return.brain_size_cm := brain_size_cm_in;
   RETURN l_return;

   l_bonobo        hominids%ROWTYPE := new_hominid ('Bonobo', NULL, 500);
   l_gorilla       hominids%ROWTYPE := new_hominid ('Gorilla', NULL, 750);
   l_neanderthal   hominids%ROWTYPE := new_hominid ('Neanderthal', NULL, 1800);
   DBMS_OUTPUT.put_line (l_neanderthal.brain_size_cm);

Ahhhhh. Just three declarations. Default values assigned in the declaration section. All the details of the assignments hidden away behind the function header.

And when I add a new column to the table (or generally a field to a record), I can add a parameter to my new_hominid function, along with a default value of NULL, and none of my existing code needs to change (unless that new column or field is needed).

Yes, I like that better.

How about you?

Wednesday, May 3, 2017

Getting my Oracle Database 12c Release 2 up and running on Mac via Docker

I love to follow in the footsteps of people who are braver, smarter and more knowledgeable than me.

So I was happy to wait till SQL Maria (Maria Colgan) published her blog post on Oracle Database 12c now available on Docker, with step-by-step instructions for taking advantage of the new Docker image for 12.2 now available (specifically, 12.2 via Docker on Github, 12.2 via Docker at the Docker Store).

I am happy to report that I can now connect SQL Developer to my containerized 12.2 database. Thank you, Maria, for a very helpful post!

Now, I am not going to repeat everything Maria already wrote. That would be silly. I will simply point out some things you might find helpful as you do the same thing I did (follow in Maria's footsteps - which, literally, meant lots of copy-pasting rather dumbly).

1. Watch out for those dashes when you copy/paste.

Docker was not responding as expected to my commands and I (well, actually, Gerald) eventually noticed that the dash, copied from the blog post, was too long - it had been translated into a different character. So watch out for that! You might need to retype the command yourself.

I hate that.


2. Create your own folder for your Oracle Database files. I know it should be obvious. But I am a copy-paste sorta guy, and probably the only one in the world who would copy this command into my terminal and expect it to work:

docker run --name oracle -p 1521:1521 -p 5500:5500 
-v /Users/mcolgan-mac/oradata:/opt/oracle/oradata 

And it did - once I created my own folder for the files, and replaced that in the command.

Oh and by the way, that entire command (once you swap out mcolgan-mac for your own foler) needs to be one one line.

After that, everything went very smoothly and, again following Maria's wonderfully clear steps, I had my database up and running.

Then I set up my connection in SQL Developer:

and voila! My own 12.2 database running in a Docker container, on my Mac.

Thanks, Maria!
Thanks, Gerald!
Thanks, Docker!
Thanks, Oracle!

Monday, May 1, 2017

Deterministic functions, caching, and worries about consistent data

A developer contacted me with the following questions last week:

We have created a function that returns a single row column value form a query. When we call this function with the same input values it takes to long to return. Example:

select max (det_function('A2')) from dual connect by rownum <= 1000000

But when we change the function to a deterministic function the statement returns really fast. The only thing where we are unsure is what happens when the tables has changed to which the statement of the function selects? Do we need a to commit this table to bring oracle to re-execute the statement in the function and not use the cache or what should we do to get a consistent return value?

   v_ident   NUMBER;
     FROM my_table
    WHERE id = v_in_id;

   RETURN v_ident;

A function is deterministic if the value returned by the function is determined entirely by its input(s).

The following function, for example, is deterministic:

FUNCTION betwnstr (
   string_in      IN   VARCHAR2
 , start_in       IN   INTEGER
 , end_in         IN   INTEGER
   RETURN (SUBSTR (string_in, start_in, end_in - start_in + 1));
END betwnstr;

You can also quickly see, I hope, that any function that contains SQL (like the first function defined above) cannot possibly be deterministic: it depends on the contents of one or more tables to do its job, and those datasets are not passed as IN parameters.

Does that mean the compiler will complain? No! But it does mean that you could create real problems for yourself if you are not careful about your use of this keyword.

So the rule should be: Only add the DETERMINISTIC keyword to truly deterministic functions.

Why? Why should it matter? Because under certain circumstances (such as the one identified by the developer above), Oracle Database will not execute your function, but instead simply use a previously cached return value.

Within the scope of a single server call (e.g., execution of a PL/SQL block), Oracle Database will keep track of input and return values for your deterministic functions. If in that same server call, you pass the same input values to the function, the database engine may choose to not actually execute the function, but instead simply pass back the previously-computed return value (for those same inputs).

That's why this developer saw such a great leap forward in performance.

Once that SELECT statement finishes, though, memory for the cache is released. When and if that same query is run again, the engine will start rebuilding and using that cache.

While that statement executing, though, no matter what sort of changes are made to the table, no matter if a commit is issued or not, those changes will not be visible to the statement that called the function.

That's why I will repeat The Rule again:

Only add the DETERMINISTIC keyword to truly deterministic functions.

If your function contains a SELECT statement and you want to call it from a SELECT statement, the best thing to do is take the SQL out of the function and "merge" it into your SQL - in other words, no user-defined functions. Just SQL.

Rob van Wijk offers lots more details on the behavior and performance of deterministic functions here. You will also be well-served to read Bryn Llewellyn's in-depth exploration of How to write a safe result-cached function.

Rather than repeat all those findings, I will simply conclude with:

1. Use the DETERMINISTIC function primarily as a way to document to future developers that your function is currently free of side effects, and should stay that way.

2. If you are looking for ways to improve the performance of functions executed inside SQL, learn more about the UDF pragma (new in Oracle Database 12c Release 1).

3. See if the function result cache feature (also explored in Bryn's blog post) might be applicable to your situation.

4. Do not call user-defined functions from SQL statements that in turn contain SQL statements (or at least do so with extreme caution). That SQL inside the function is not part of the same read-consistent image as the data set identified by the "outer" SQL.