Sunday, December 19, 2010

Recovery Fundamentals: SCN

This post is to give you information about various recovery fundamental details and how recovery works.

We will start by looking at various SCNs and where they are stored.

There are 3 SCNs basically in control file

1. Checkpoint SCN
2. Stop SCN
3. Thread checkpoint SCN

Checkpoint SCN is the datafile checkpoint SCN when checkpoint happens for datafile. This checkpoint SCN is recorded in datafile header as well.

Stop SCN is the SCN which gets recoreded in control file when datafile is taken in begin backup mode or when datafile is taken offline. This is the checkpoint at a point when datafile header is freezed.

Thread Checkpoint SCN is the one related to online redo log files. This SCN gets generated when ever transaction get recoreded in online redo log file.

When we shut down database with normal or immediate option, all these SCN are synchronized and made equal.

Lets take a quick example:

1) System checkpoint SCN in controlfile

SQL> select checkpoint_change# from v$database;

CHECKPOINT_CHANGE#
——————
3700901
2) Datafile checkpoint SCN in controlfile

SQL> select name, checkpoint_change# from v$datafile
2 where name like ‘%htmldb%’;

NAME
——————————————————————————–
CHECKPOINT_CHANGE#
——————
/dy/oracle/product/db10g/dbf/htmldb01.dbf
3700901

3) Stop SCN in control file

SQL> select name, last_change# from v$datafile
2 where name like ‘%htmldb%’;

NAME
——————————————————————————–
LAST_CHANGE#
————
/dy/oracle/product/db10g/dbf/htmldb01.dbf

4) Start SCN in datafile header

SQL> select name, checkpoint_change# from v$datafile_header
2 where name like ‘%htmldb%’;

NAME
——————————————————————————–
CHECKPOINT_CHANGE#
——————
/dy/oracle/product/db10g/dbf/htmldb01.dbf
3700901

Shut down the database now and start in mount mode

SQL> shut immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 1073741824 bytes
Fixed Size 1984184 bytes
Variable Size 243276104 bytes
Database Buffers 822083584 bytes
Redo Buffers 6397952 bytes
Database mounted.
SQL> select checkpoint_change# from v$database;

CHECKPOINT_CHANGE#
——————
3722204

SQL> select name, checkpoint_change# , last_change# from v$datafile
2 where name like ‘%htmldb%’;

NAME
——————————————————————————–
CHECKPOINT_CHANGE# LAST_CHANGE#
—————— ————
/dy/oracle/product/db10g/dbf/htmldb01.dbf
3722204 3722204

All these SCN values are coming from control file. Here you can see that last_change# from v$datafile was showing NULL. But when we shut down the database this value got updated to same as checkpoint_change#. This last_change# is the stop SCN and checkpoint_change# is the start SCN. So when we shutdown the database it run a checkpoint and makes start SCN = stop SCN.

Lets check the SCN in datafile header

SQL> select name, checkpoint_change# from v$datafile
2 where name like ‘%htmldb%’;

NAME
——————————————————————————–
CHECKPOINT_CHANGE#
——————
/dy/oracle/product/db10g/dbf/htmldb01.dbf
3722204
So here we see that datafile header is having same checkpoint # as system checkpoint number.

How oracle decides whethere recovery is required?

When database is started, Oracle checks the system SCN stored in control file and datafiles header. It compared system SCN which each datafile header and it those matches, then next it checks the start SCN and stop SCN in datafile headers, if those are also same then it will open the database else it as for recovery.
Also as soon as we open the database the last_change# in v$datafile_header will be set to NULL again.

Now shutdown the database with abort option.

SQL> shut abort
ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.

Total System Global Area 1073741824 bytes
Fixed Size 1984184 bytes
Variable Size 243276104 bytes
Database Buffers 822083584 bytes
Redo Buffers 6397952 bytes
Database mounted.
SQL> select checkpoint_change# from v$database;

CHECKPOINT_CHANGE#
——————
3722206

So we can see thet system checkpoint # is 3722206

SQL> select name, checkpoint_change# , last_change# from v$datafile
2 where name like ‘%htmldb%’;

NAME
——————————————————————————–
CHECKPOINT_CHANGE# LAST_CHANGE#
—————— ————
/dy/oracle/product/db10g/dbf/htmldb01.dbf
3722206
Here you can see that datafile header checkpoint SCN is also 3722206, but stop SCN # in controlfile is NULL. If shutdown checkpoint would have happened, then it would have updated the stop SCN for controlfile. But since we used “shut abort”, no checkpoint happened during shutdown. This situation is called “crash recovery”. Here the start SCN of datafile header and stop SCN of datafile header are not matching. This kind of situation is automatically taken care by Oracle. When you open the database, oracle automatically applies the transaction from redo log files and undo tablespace and it will recover the database. Problem happens when system SCN # does not match with datafile header start SCN. This is called “instance recovery”.

During start of database Stop SCN = NULL => Needs crash recovery
During Start of database DATAFILE HEADER START SCN != SYSTEM SCN in control file => Media recovery

When doing media recover we can have 2 situations

1) Datafile header SCN is less then datafile SCN stored in control file.

So when you open the database, Oracle checks the SCN number of datafile present in datafile header and control file. If the SCN matches it will open the datafile, else it will ask for recovery. Now when it ask for recovery, it will check the start SCN of datafile in datafile header. From this SCN onwards it needs recovery. So all the logs having this SCN number and beyond is required for recovery.

2) Datafile header SCN is more then datafile SCN stored in control file.

This kind of situation happens when you use backup control file or when you are recovering using “Backup controlfile”. In such situation since datafile header SCN is higher then control file, Oracle really doesn’t know till what SCN to recover. So you tell Oracle that you are using a “backup controlfile” and that you will tell it when to stop applying redo by replying “cancel.” When Oracle starts recovery, it looks at the datafiles to know the last time a checkpoint was performed on the datafile. Oracle now knows to start applying recovery to the datafile for all SCNs after the SCN in the datafile header. But Oracle does not know when to stop, and eventually, Oracle applies recovery in all of your archived redo logs. You can then tell Oracle to use the redo in the online redo logs. Oracle will ask you where to find more redo. At this point, you tell it to quit applying redo by replying CANCEL.

Once we open in reset logs mode, SCN numbers are synchronized in datafiles and controlfiles and redo sequence numbers are reset to 1.

archive Generation huge in begin backup mode why ?

Many of you must have heard or experienced that while taking hot backup of database LGWR process writes aggressively. Meaning that more redo data has been written to redo log file and consecutively more archive logs gets generated.

Here is the common misconception we have in our mind. If some one ask, why excessive redo logs and archive logs are getting generated when we start a hot backup of database ?? Quickly we answer .. Its simple, when we put tablespace in hot backup mode, Oracle will take a check point of tablespace and data files belonging to this tablespace will be freezed. Any user activity happening on objects belonging to this tablespace wont write data to these datafiles, instead it will write data to redo log files. So obviously there will be more redo log file generation.

Well, to some extent this is COMPLETELY WRONG !!!

I will straight way come to the point and explain you what happens when we put the tablespace in hot backup mode.

Your first assumption that datafiles belonging to the tablespace in hot backup mode is freezed is wrong. Datafiles are not freezed, only the datafile headers will be freezed !! So simply imagine that when you put the tablespace in backup mode, Oracle will take a checkpoint and update the datafile headers with checkpoint SCN and there after it is freezed until we take tablespace out of backup mode.

Other datafile (other then header part) remains as normal and data changes happens continuously to this datafile.

Now you may want to ask me “do I mean to say that datafiles gets updated continuously even when we are coping the same to backup location ?”. The answer is YES. Never think that the datafile you are coping is “Consistent”. No, datafiles gets changed continuously !!!

You might want to ask couple of more questions then.

1) If we say that backup file is not consistent and changes continuously, then how come Oracle is able to recover the database when we restore that datafile?

2) If the data changes are anyway happening continuously on data files, then why there is excess redo log generation ?

Thats it !! don’t ask me more then this. Let me explain answers to these questions.

Consider a typical case, where an Oracle database is installed on Linux platform. The standard Oracle block size if 8K and lets say that OS level data block size is 512K. Now when we put the tablespace in “Begin Backup” mode checkpoint has happened and datafile header is freezed. You found which are the files related to this tablespace and started copying using OS command. Now when you copy a datafile using OS command it is going to copy as per OS block size. Lets say when you start copying it gave 8 blocks to you to copy – that means you are copying 4K (512K X 4) to backup location. That means you are copying half of Oracle block to backup location. Now this process of copy can be preempted by Server CPU depending on load. Lets say when you started copying after copy of those 8 block (4K, half of Oracle block), your process get preempted by CPU and it has allocated CPU time to some other important process. Mean while DBWR process changes that block that you have copied halfway (since datafile is not freezed and only header is freezed, continuous updates can happen to datafile).

After a while CPU returns back and gives you next 8 blocks to copy (rest of the halk Oracle block). Now here is the problem !!! we copied half of the oracle block taken at time T0 and another half taken at time T1 and in-between the data block got changed. Does this sounds consistent ? Not to me !! Such type of block is called “Fractured Block”.

Well, since Oracle copies files like this it should do some thing, so that during recovery it wont face any problem.

Usually in case of a normal tablespace (which is not in begin backup mode), when a transaction happens oracle generates redo information and puts in redo log file. This is the bare minimum information that oracle generates in order to redo the information. It does not copy the complete block. Where as in case of begin backup mode, if a transaction happens and changes any block FOR THE FIST TIME, oracle copies the complete block to redo log file. This happens only during first time. If subsequent transaction updates the same block again, oracle will not copy the complete block to redo, instead it will generate minimum information to redo the changes. Now because oracle has to copy the complete block when it changes for the first time in begin backup mode, we say that excess redo gets generated when we put tablespace in begin backup mode.

Question arises, why Oracle has to copy the complete block to redo log files. As you have seen above that during copy of datafile, there can be many fractured blocks, and during restore and recovery its going to put those block back and try to recover. Now assume that block is fractured and oracle has minimum information that it generates in the redo. Under such condition it wont be possible for Oracle to recover such blocks. So instead Oracle just copies the entire block back from redo log files to datafiles during recovery process. This will make the datafile consistent. So recovery process is very important which takes care of all fractured blocks and makes it possible to recover a database.

I hope this explains above 2 questions.

Now you can easily explain why hot backup is not possible if database is in NOARCHIVELOG mode.

When you take a backup using RMAN, it does not generate excessive redo logs. The reason is simple. RMAN is intelligent. It does not use OS block for copying, instead it uses oracle blocks for copying datafiles so the files are consistent.

Hope this helps !!

Monday, December 13, 2010

Shrinking listener logfile online.

% cd /u01/app/oracle/product/9.2.0/network/log
% lsnrctl set log_status off
% mv listener.log listener.old
% lsnrctl set log_status on


Followers