Restore Database after fatal User Error

Even though Oracle enhances the restore or fault tolerance capabilities with every release there are still fatal errors which lead into a major outage of the database. Those errors are mainly produced by human beings โ€“ sorry guys, but thatโ€™s how it is.

In this blog Iโ€™ll show you how to restore the database to a specific SCN after a major incident. Actually, there is a true story behind this: Some years ago I had a customer whose production database was named โ€œTESTโ€. Even though I told them to rename the database to somewhat meaningful the DBA was totally convinced that this is not a problemโ€ฆ

As time goes by he had a development database where he wanted to test a new version of the application so he dropped the userโ€ฆ

You probably know what happened?

He dropped the application user on the โ€œTESTโ€ database. The entire production line came to a full stop โ€“ no wonder- and the DBA became a little nervous. He was shocked and called me to help restore his database.

Find out when the Problem occured

My first question was: โ€œWhen did you execute the dropโ€ โ€“ silence. Actually I think he did not even know what day it was so his answer was: โ€œabout one hour ago โ€“ or soโ€.

Not a real help if you need to restore a database to a very specific point in time because even though the production stopped we didnโ€™t want to lose too much data. So I was using Toad and the LogMiner wizard to find out when he executed the fatal script.

To give you an impression on how to look for the last DML and first DDL (drop) I inserted one row in the customer table and checked for that time:

4201.image001

Picture 1: Insert Customer Data

As you can see the time of the insert was at 16:31:58 on November 27th 2014. And how the fatal SQL occurs.

SQL> DROP USER demeng CASCADE;   User dropped

Now we can start the LogMiner and give it some estimations where to look for the data. Menu โ†’ Database โ†’ Diagnostic โ†’ LogMiner

4428.image002

Picture 2: Dictionary select in LogMiner

Even though Iโ€™m not on the server (the database JOHANN is running on a linux box) Iโ€™m able to directly use the Online Data Dictionary and check for the most recent redo log files.

6254.image004

Picture 3: Find Files for LogMiner Session

5282.image005

Picture 4: Select online RedoLogs

At this point it doesnโ€™t matter if you are using the online redologs or the archived ones as long as they are still available. Thatโ€™s why it makes sense to keep the archived redologs on disk as long as you can.

With the next step the content of the redologs is analyzed. Unfortunately the timestamp given back on the screen is wrong. The only value which matters is the SCN where the high number indicates that the SCN is invalid. So Iโ€™m reading until the end of the logfiles.

I changed the โ€œFromโ€ and โ€œToโ€ Data fields to the approximately time window Iโ€™m looking for. And I eliminated uncommitted data.

8422.image006

Picture 5: Narrow Date for LogMiner Session

Before I press the green triangle to execute the LogMiner query I add some more columns to the list because they might help me to find the right SCN. Those columns are โ€œSegment Ownerโ€, โ€œSegment Nameโ€ and โ€œOperationโ€.

4705.image007

Picture 6: Select additional Columns

So let’s do the analysis:

6763.image008

Picture 7: Query Operations of DEMOENG

Because Iโ€™ve dropped the schema โ€œDEMOENGโ€ I filter the output to only in that โ€œSchema Ownerโ€. I had expected to see the โ€œINSERT INTO CUSTOMER โ€ฆโ€ statement first but actually only the DROP statements occurring right after the INSERT where shown. The reason is simple: Because that user had been dropped there is no longer a relationship between the tables and the owner (except for DDLs).

So a new filter on the SCN gave me an indication on the DML statements occurring right before that fatal DROP.

image009

Picture 8: Get SCN before fatal Drop

As you can see and verify with the first screenshot the INSERT INTO CUSTOMER still exists (highlighted) but there is no longer any meaningful data as there are no corresponding columns or object names. But that doesnโ€™t matter. Looking for the details I can see that I probably logged in before the fatal error at approximately 16:35:00. So I can recover until that time or SCN.

Restoring the Database

In a real life environment I would have opened the database in restricted mode to avoid any other application to continue to work and I would have taken a second backup just to ensure Iโ€™ve some more trials if miscalculating the SCN. In my example Iโ€™m now shutting down the instance and using RMAN to restore the database.

RMAN> SHUTDOWN IMMEDIATE   RMAN> STARTUP MOUNT   RMAN> RESTORE DATABASE UNTIL SCN 859045;   RMAN> RECOVER DATABASE UNTIL SCN 859045;   RMAN> ALTER DATABASE OPEN RESETLOGS;

The successful login to the schema โ€œDEMOENGโ€ will show that the recovery was successful. And a query for custid=200000 will list the row I inserted right before the DROP USER.

Conclusion

The LogMiner wizard is easy to use if you are able to read the content of V$LOGMNR_CONTENTS (thatโ€™s the view behind the result set). It doesnโ€™t matter if you are using Oracle Standard Edition as in this case and have to restore the database using RMAN. If you are on Enterprise Edition you might want to use flashback database instead. But the behavior is similar. During my tests I realized the INSERT command for the single row I added did not show up in the logminer session. I had to enable supplemental logging first. So I would guess it does make sense to set supplemental logging even if you donโ€™t use replication. In case of a fatal user error it will probably help identify the correct SCN.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top