Sunday, December 19, 2010

Recovery Fundamentals: SCN

This post is to give you information about various recovery fundamental details and how recovery works.

We will start by looking at various SCNs and where they are stored.

There are 3 SCNs basically in control file

1. Checkpoint SCN
2. Stop SCN
3. Thread checkpoint SCN

Checkpoint SCN is the datafile checkpoint SCN when checkpoint happens for datafile. This checkpoint SCN is recorded in datafile header as well.

Stop SCN is the SCN which gets recoreded in control file when datafile is taken in begin backup mode or when datafile is taken offline. This is the checkpoint at a point when datafile header is freezed.

Thread Checkpoint SCN is the one related to online redo log files. This SCN gets generated when ever transaction get recoreded in online redo log file.

When we shut down database with normal or immediate option, all these SCN are synchronized and made equal.

Lets take a quick example:

1) System checkpoint SCN in controlfile

SQL> select checkpoint_change# from v$database;

CHECKPOINT_CHANGE#
——————
3700901
2) Datafile checkpoint SCN in controlfile

SQL> select name, checkpoint_change# from v$datafile
2 where name like ‘%htmldb%’;

NAME
——————————————————————————–
CHECKPOINT_CHANGE#
——————
/dy/oracle/product/db10g/dbf/htmldb01.dbf
3700901

3) Stop SCN in control file

SQL> select name, last_change# from v$datafile
2 where name like ‘%htmldb%’;

NAME
——————————————————————————–
LAST_CHANGE#
————
/dy/oracle/product/db10g/dbf/htmldb01.dbf

4) Start SCN in datafile header

SQL> select name, checkpoint_change# from v$datafile_header
2 where name like ‘%htmldb%’;

NAME
——————————————————————————–
CHECKPOINT_CHANGE#
——————
/dy/oracle/product/db10g/dbf/htmldb01.dbf
3700901

Shut down the database now and start in mount mode

SQL> shut immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 1073741824 bytes
Fixed Size 1984184 bytes
Variable Size 243276104 bytes
Database Buffers 822083584 bytes
Redo Buffers 6397952 bytes
Database mounted.
SQL> select checkpoint_change# from v$database;

CHECKPOINT_CHANGE#
——————
3722204

SQL> select name, checkpoint_change# , last_change# from v$datafile
2 where name like ‘%htmldb%’;

NAME
——————————————————————————–
CHECKPOINT_CHANGE# LAST_CHANGE#
—————— ————
/dy/oracle/product/db10g/dbf/htmldb01.dbf
3722204 3722204

All these SCN values are coming from control file. Here you can see that last_change# from v$datafile was showing NULL. But when we shut down the database this value got updated to same as checkpoint_change#. This last_change# is the stop SCN and checkpoint_change# is the start SCN. So when we shutdown the database it run a checkpoint and makes start SCN = stop SCN.

Lets check the SCN in datafile header

SQL> select name, checkpoint_change# from v$datafile
2 where name like ‘%htmldb%’;

NAME
——————————————————————————–
CHECKPOINT_CHANGE#
——————
/dy/oracle/product/db10g/dbf/htmldb01.dbf
3722204
So here we see that datafile header is having same checkpoint # as system checkpoint number.

How oracle decides whethere recovery is required?

When database is started, Oracle checks the system SCN stored in control file and datafiles header. It compared system SCN which each datafile header and it those matches, then next it checks the start SCN and stop SCN in datafile headers, if those are also same then it will open the database else it as for recovery.
Also as soon as we open the database the last_change# in v$datafile_header will be set to NULL again.

Now shutdown the database with abort option.

SQL> shut abort
ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.

Total System Global Area 1073741824 bytes
Fixed Size 1984184 bytes
Variable Size 243276104 bytes
Database Buffers 822083584 bytes
Redo Buffers 6397952 bytes
Database mounted.
SQL> select checkpoint_change# from v$database;

CHECKPOINT_CHANGE#
——————
3722206

So we can see thet system checkpoint # is 3722206

SQL> select name, checkpoint_change# , last_change# from v$datafile
2 where name like ‘%htmldb%’;

NAME
——————————————————————————–
CHECKPOINT_CHANGE# LAST_CHANGE#
—————— ————
/dy/oracle/product/db10g/dbf/htmldb01.dbf
3722206
Here you can see that datafile header checkpoint SCN is also 3722206, but stop SCN # in controlfile is NULL. If shutdown checkpoint would have happened, then it would have updated the stop SCN for controlfile. But since we used “shut abort”, no checkpoint happened during shutdown. This situation is called “crash recovery”. Here the start SCN of datafile header and stop SCN of datafile header are not matching. This kind of situation is automatically taken care by Oracle. When you open the database, oracle automatically applies the transaction from redo log files and undo tablespace and it will recover the database. Problem happens when system SCN # does not match with datafile header start SCN. This is called “instance recovery”.

During start of database Stop SCN = NULL => Needs crash recovery
During Start of database DATAFILE HEADER START SCN != SYSTEM SCN in control file => Media recovery

When doing media recover we can have 2 situations

1) Datafile header SCN is less then datafile SCN stored in control file.

So when you open the database, Oracle checks the SCN number of datafile present in datafile header and control file. If the SCN matches it will open the datafile, else it will ask for recovery. Now when it ask for recovery, it will check the start SCN of datafile in datafile header. From this SCN onwards it needs recovery. So all the logs having this SCN number and beyond is required for recovery.

2) Datafile header SCN is more then datafile SCN stored in control file.

This kind of situation happens when you use backup control file or when you are recovering using “Backup controlfile”. In such situation since datafile header SCN is higher then control file, Oracle really doesn’t know till what SCN to recover. So you tell Oracle that you are using a “backup controlfile” and that you will tell it when to stop applying redo by replying “cancel.” When Oracle starts recovery, it looks at the datafiles to know the last time a checkpoint was performed on the datafile. Oracle now knows to start applying recovery to the datafile for all SCNs after the SCN in the datafile header. But Oracle does not know when to stop, and eventually, Oracle applies recovery in all of your archived redo logs. You can then tell Oracle to use the redo in the online redo logs. Oracle will ask you where to find more redo. At this point, you tell it to quit applying redo by replying CANCEL.

Once we open in reset logs mode, SCN numbers are synchronized in datafiles and controlfiles and redo sequence numbers are reset to 1.

No comments:

Post a Comment

Followers