Redundant data saving is usual for almost every database nowadays, and it does not matter if it is a RAID-1 (mirroring) or RAID-5 or anything like that. Nevertheless critical databases must be saved by a copy. This is why:
- By corruption of data the mirroring could be damaged as well.
- A data backup is not sufficient, because the restore would take too long in case of an error.
- Logical mistakes ([inlinecode]DROP[/inlinecode] or [inlinecode]TRUNCATE TABLE[/inlinecode]) could not be recognized or eliminated by a disk mirroring.
To create such a copy there are at least two different methods:
- Creating an identical copy and ongoing recovery (physical standby database, e.g. Oracle Data Guard)
- Creating an independant database and replication of the SQL tasks (replication, e.g. DELL SharePlex for Oracle)
The advantage of this physical standby is, that you do not need to take care about this copy. As it is an identical copy, it is unlikely that SQL tasks would not work here that are performed on the primary database. The only weak point would be an uneven storage property, i.e. while the primary database is still able to allocate further storage, this is depleted on the standby database. After all the disadvantage is that this database cannot be used or only in slight ranges. As it is set in recovery mode, SQL tasks cannot be performed. Although Oracle is offering a so called Active Data Guard configuration, in which a recovery is performed while the database read-only is launched, the read-only access is sometimes not sufficient.
The advantage of the replicated database is that this is an independent database. This means it is opened read-write and it can be a different Oracle software version. It is not necessary to replicate the entire database, but also critical tables can be transmitted. Beside these advantages there is the disadvantage of being an independent database. This means it has to be audited and managed just like the primary database. And there are – meanwhile very uncommon – restrictions in the to be replicated data types or objects.
Which of these to choose depends on the requirements and on the charged license fees.
Why copy
How about the reasons for the copy?
- For the avoidance and recognition of data corruption your chance is 50:50 with the physical standby database. As it is at least always a physical copy of the primary database, a corruption that is already existing would be copied as well – so no actual recognition. If the data corruption occurs later, it would be recognized and fixed most likely. The replication could be a physical copy as well, but in many cases the replication is launched by the DataPump export/import. In those cases the corruption would most likely be recognized. In the following replication it is likely as well, that the corruption can be recognized and also fixed.
- If the primary database is not available due to any reason, you can switch quickly with the standby database and the replicate (only if the replicate owns all necessary data, of course). While usually in the standby solution the database must be prepared first (taking it out of recovery mode and restarting the instance), with a replicate you can get started immediately, as the database is already launched read-write. Nevertheless you can say, that both methods are well applicable as disaster recovery.
- For logical errors none of these methods offers a solution. A “DROP TABLE” would be transmitted onto the target database 1:1 and performed.
Mitigate Risks of logical errors
How can you mitigate the risk of logical errors?
- The most efficient method is to eliminate DDL operations from the start is by separating schema and user from each other and lock the schema account. In this case schema is the user, who owns the objects. The “user” is the one working with the applications. This user only gets DML rights for the objects belonging to scheme. By this method many logical errors can be avoided for sure – but not all. A DELETE FOM TABLE or UPDATE TABLE is still possible and leads to a loss of data, too. But a DELETE or UPDATE can be undone (e.g. by FLASHBACK TABLE or with the Oracle LogMiner).
- In the standby database and also in the replicate there is the possibility to have a time lag built in, this means the change is transmitted immediately but performed on the database after a certain time. This method is mainly preferred with batch operations or upgrades and patches. You just stop the replication or the redo apply prior to the batch processing and if this was successful, it is started. A disadvantage is, that it cannot be switched immediately if there is a physical error on the primary database, but a “follow-on” of the open transaction restrains the switch.
- With the replication of the database there is the additional possibility to only allow DML operations. As already mentioned in 1. by this, logical errors become much less likely, but with the disadvantage, that DDL operations must be performed on the replicate explicitly. That means DDL changes from a release switch or a patch of the application must be performed on both databases. If this does not happen, the replication is cancelled and high availability is lost.
Flashback Database
There is one more possibility to fix logical errors: Flashback database. But this is confined to the Oracle Enterprise Edition.
By Flashback database the secondary database, no matter if standby database or replicate, can be set back to a former point. Afterwards it is launched again in read-only or read-write mode, the broken data is read and the database is shut down again. Of course, you would have gotten here with a backup, too. But the main point for Flashback database is, that you can “scroll” and the replication or the standby database is simply getting back to functionality.
This shall be described with the example Oracle Data Guard: Imagine, at 12:30 p.m. on March 25th 2015 a user was highly motivated and cleared a table by TRUNCATE TABLE. Luckily only a minor part of the application is effected, still the data must be restored.
First the redo apply on the standby database is turned off, so the database can actually be used.
DGMGRL> EDIT DATABASE 'RINGO_SB' SET STATE ='APPLY-OFF';
Next a restore point on the standby database is set, so you can start over the redo apply from here:
SQL> CREATE RESTORE POINT before_flashback;
Afterwards the actual Flashback can be done and the database is launched to read and write:
SQL> FLASHBACK DATABASE TO TIMESTAMP to_date('2015-03-25 12:30:00','YYYY-MM-DD HH24:MI:SS'); SQL> ALTER DATABASE ACTIVATE STANDBY DATABASE; SQL> ALTER DATABASE OPEN;
The database is now available for every activity, which means you can first have a look if the deleted table contained the “right” data. Afterwards the table can be exported, e.g. with the DataPump.
The table is rescued and the database can be included in Data Guard configuration as standby database again. Therefore it is first shut down and then “scrolled” to the restore point “before_flashback”:
SQL> SHUTDOWN IMMEDIATE SQL> STARTUP MOUNT SQL> FLASHBACK DATABASE TO RESTORE POINT before_flashback; SQL> ALTER DATABASE CONVERT TO PHYICAL STANDBY;
By starting the redo apply every redo log information, that came up in the meanwhile, is loaded and after some time Data Guard should announce “SUCCESS” for the configuration.
DGMGRL> EDIT DATABASE 'RINGO_SB' SET STATE ='APPLY-ON'; DGMGRL> SHOW CONFIGURATION ... SUCCESS