This blog has moved here.

Wednesday, November 07, 2007

Playing with Flashback Archives

Today I've decided to play a little bit with a new Oracle 11g feature called "Flashback Archive". The "Flashback" term is not new at all. I heard about it, for the very first time, in Oracle 9i. At that time, the "flashback" feature it was on its very beginning and allowed just to query a table "AS OF" a previous moment in time, taking into consideration, especially, UNDO segments constraints. As far as I remember there was no possibility to guarantee that the UNDO will really deliver the requested information as there was no GUARANTEE option for the UNDO tablespaces. However, Oracle 10g continued to refine this feature and brought a plenty of improvements: guarantee option for UNDOs, flashback database and flashback table, flashback transactions and others. Now, Oracle 11g adds a new component to the "flashback suite" which is this "flashback archive".
The "flashback archive" is nothing more than a repository which, of course, requires storage provided through one or more tablespaces and it has a well defined retention policy. This archive may be used to keep track of all changes performed on several configured tables. It is important to notice that you can setup more than one flashback archives which, interestingly, may share the same underlying tablespaces.

For example:
SQL> create tablespace test_arch_tbs datafile size 100M;

Tablespace created

SQL> create flashback archive test_archive tablespace test_arch_tbs retention 1 month;

Flashback archive created.

SQL> create flashback archive test_archive2 tablespace test_arch_tbs retention 1 month;

Flashback archive created.

Please notice that both "test_archive" and "test_archive2" share the same "test_arch_tbs" tablespace. In addition, you can configure a quota for every flashback archive you create, otherwise unlimited is supposed to be the default option.
Another important thing to notice is that you can configure a default flashback archive. You can do this at the archive creation time by specifying the "DEFAULT" option or afterwards using an "ALTER FLASHBACK ARCHIVE... SET DEFAULT". For both operations you need to be logged in as SYSDBA. Of course, you cannot have more than one default flashback archive.
Now, lets create a table with the flashback archive option enabled:
SQL> create table customers (
2 cust_id integer,
3 cust_name varchar2(100),
4 country_id varchar2(2)
5 ) flashback archive test_archive;

Table created

If the archive name is not explicitly specified then the default archive (if previously setup) will be used. It's time now to insert some records into our customers table. The question which arises is which time-stamp is used when flashback query the table: the one which corresponds to the INSERT itself or the one corresponding to the COMMIT statement. We'll figure out shortly:
SQL> insert into customers values (1, 'synapstec', 'ro');

1 row inserted

SQL> select systimestamp from dual;

SYSTIMESTAMP
-------------------------------------------------
07-NOV-07 05.50.01.383321 PM +02:00

SQL> commit;

Commit complete

SQL> select systimestamp from dual;

SYSTIMESTAMP
-------------------------------------------------
07-NOV-07 05.50.15.130301 PM +02:00

SQL> select /* INSERT TIME */ count(1)
2 from customers
3 as of timestamp to_timestamp('07-NOV-07 05.50.01.383321 PM',
4 'dd-mon-yy hh.mi.ss.ff PM');

COUNT(1)
----------
0

SQL> select /* COMMIT TIME */ count(1)
2 from customers
3 as of timestamp to_timestamp('07-NOV-07 05.50.15.130301 PM',
4 'dd-mon-yy hh.mi.ss.ff PM');

COUNT(1)
----------
1

As you can see the COMMIT time is taken into consideration when the table is flashback queried, which is the expected behavior. This means that, for example, several UPDATES onto the same column will not be stored into the archive but the last committed one.
Likewise, you may encounter problems if you need to flashback query at very precise timestamps. Please have a look to the above example:
6:21:38 PM SQL> update customers set country_id = 'fr';

1 row updated

6:21:38 PM SQL> commit;

Commit complete

6:21:38 PM SQL> select systimestamp from dual;

SYSTIMESTAMP
-------------------------------------------------
07-NOV-07 06.16.16.506341 PM +02:00

6:21:38 PM SQL> update customers set country_id = 'en';

1 row updated

6:21:38 PM SQL> commit;

Commit complete

6:21:38 PM SQL> select systimestamp from dual;

SYSTIMESTAMP
-------------------------------------------------
07-NOV-07 06.16.16.669808 PM +02:00

6:23:09 PM SQL> select /* FIRST COMMIT */ country_id
2 from customers
3 as of timestamp to_timestamp('07-NOV-07 06.16.16.506341 PM',
4 'dd-mon-yy hh.mi.ss.ff PM');

COUNTRY_ID
----------
ro

6:23:10 PM SQL>
6:24:18 PM SQL> select /* SECOND COMMIT */ country_id
2 from customers
3 as of timestamp to_timestamp('07-NOV-07 06.16.16.669808 PM',
4 'dd-mon-yy hh.mi.ss.ff PM');

COUNTRY_ID
----------
ro

6:27:56 PM SQL> select /* AFTER 3 SECONDS */ country_id
2 from customers
3 as of timestamp to_timestamp('07-NOV-07 06.16.19.669808 PM',
4 'dd-mon-yy hh.mi.ss.ff PM');

COUNTRY_ID
----------
ro

6:29:27 PM SQL> select /* AFTER SOME SECONDS */ country_id
2 from customers
3 as of timestamp to_timestamp('07-NOV-07 06.16.21.669808 PM',
4 'dd-mon-yy hh.mi.ss.ff PM');

COUNTRY_ID
----------
en

As you can see the rough time of each of the above commits doesn't return the correct flashback data. There is a delay between the COMMIT and the relative time to which the flashback archive may be queried. This drawback can be alleviated by using SCNs. So, we can replace timestamps with their corresponding, more precise, SCNs:
8:58:50 PM SQL> update customers set country_id = 'fr';

1 row updated

8:58:50 PM SQL> commit;

Commit complete

8:58:50 PM SQL> select dbms_flashback.get_system_change_number from dual;

GET_SYSTEM_CHANGE_NUMBER
------------------------
611944

8:58:51 PM SQL> update customers set country_id = 'en';

1 row updated

8:58:51 PM SQL> commit;

Commit complete

8:58:51 PM SQL> select dbms_flashback.get_system_change_number from dual;

GET_SYSTEM_CHANGE_NUMBER
------------------------
611947

8:58:51 PM SQL>
9:00:13 PM SQL> select /* FIRST COMMIT */ country_id
2 from customers
3 as of scn 611944;

COUNTRY_ID
----------
fr

9:00:13 PM SQL> select /* SECOND COMMIT */ country_id
2 from customers
3 as of scn 611947;

COUNTRY_ID
----------
en

Nice!
There are however some limitations as far as this feature is concerned. The ugliest ones, at least from my point of view, are:
1. the fact that I cannot modify columns owned by a table which has the flashback archive option enabled along with
2. the fact that we cannot perform partition operations for those tables.
These, put together with a long retention period, become a little bit scary because it is quite likely that, in 5 years for example, the business logic to change leading to modifying the type of a column, for instance. These changes are difficult to predict therefore extreme care should be given for the design of those tables which require flashback archiving. Again, if you have a big transaction table which needs flashback archiving but it is also suitable for partitioning, which feature should we elect? Hard choice, right?
If you ask me, I would, at least, prefer to have the possibility to temporary store the flashback archive records for a particular table into a temporary stage, with something like: "create table temp_stage as select * from {table} ... as of ...", then to disable the flashback archive, to do the maintenance stuff and then to have the possibility to reload the archive from the temporary stage. Well, it's just a thought... of course I have no idea about internal stuff and what this might imply.

Okey, that's it for the moment. The feature is nice and fits to a lot of projects.

No comments: