Fahd Mirza on AI, Cloud, DevOps and Databases: April 2011

Thursday, April 28, 2011

Optimizing the Shared Pool Reserved Area

Managing the shared pool is not just about setting the sga_target or memory_target or even shared_pool parameter. More often than not, you see ORA-04031: unable to allocate <> bytes of shared memory error. It is the indicator that there is space problem with the shared pool. Either the shared pool needs more space, or it is fragmented.

If you see that there is plenty of space in shared pool and yet you are getting the ORA-04031 error, then it is likely that you are suffering from the shared pool fragmentation. If the objects don't manage to get space from the shared pool, and if their requested space is higher than the _shared_pool_reserved_min_alloc, then they get it from the shared_pool_reserved_area, which is reserved for free space in case space is needed in shared pool. But ORA-04031 can happen even when there is plenty of space in the shared_pool_reserved_area, because the requested space by the objects is less than the _shared_pool_reserved_min_alloc.

So even when you have enough space in reserverd area, you don't get to use it because requested space is less than the min alloc. If that is happening frequently, you can decrease the size of _shared_pool_reserved_min_alloc, so that the reserved space can be used by starving sessions.

You can check how much shared pool space was requested and failures:

SQL> select REQUEST_FAILURES, LAST_FAILURE_SIZE from v$shared_pool_reserved ;

REQUEST_FAILURES LAST_FAILURE_SIZE
---------------- -----------------
2036 4160

and the size of _shared_pool_reserved_min_alloc is:

SQL> select a.ksppinm name, b.ksppstvl value from x$ksppi a, x$ksppcv b where a.indx = b.indx and a.ksppinm like '%shared%pool%min%alloc%' order by name;

NAME VALUE
------------------------------- ------
_shared_pool_reserved_min_alloc 4400

So 2036 requests for the space failed because the size they requested was 4160, and they need to be higher than 4400 to get space from shared_pool_reserved_area. If you see that frequently, and you have free space in shared_pool_reserved_area, then decrease the size of _shared_pool_reserved_min_alloc below to 4160 to service those requests.

Check the free space in shared_pool_reserved_area as:

select KSMCHCOM, count(*), sum(KSMCHSIZ) from sys.x$ksmspr group by KSMCHCOM;

SQL> select KSMCHCOM, count(*), sum(KSMCHSIZ) from sys.x$ksmspr group by KSMCHCOM;

KSMCHCOM COUNT(*) SUM(KSMCHSIZ)
---------------- ---------- -------------
free memory 295 239619456

Saturday, April 16, 2011

Oracle Career in Pakistan , DBA or Developer?

I routinely receive questions through blog comments and email which go something like this:

I am a student/internee/newbie in Oracle. Should I be a Oracle Developer or DBA? Which certification should I pursue? Any Institute which offers good Oracle education at nominal rates?

Let me answer those questions one by one:

I am a student/internee/newbie in Oracle. Should I be a Oracle Developer or DBA?

Nobody can answer that question except you. Don't look at the market trends or the salaries earned by the Oracle developers or DBAs you know. Look inside you. Analyze yourself. Do you like programming? Do you like to convert business problems into well-written computer programs? Does SQL, PL/SQL, JAVA fascinates you a lot? If yes then you are a Developer and you should work towards becoming Oracle Developer.

If you are more geared towards trouble shooting, installing, configuring and managing the mission critical systems then you are a DBA and that is your career path.

In addition to above points, there are somethings which must be kept in perspective. Oracle is not just confined to database or SQL, PL/SQL. Oracle covers whole suite of ERP, hundreds of products in middleware, many computer languages including JAVA, operating systems like Oracle Linux and Solaris, many many products in vertical industries, all the hardware of former Sun, and there is that database machine Exadata and I am sure I have missed many things Oracle.

So just confine yourself to Oracle database and Oracle Forms/Reports, explore the oracle.com for Oracle products, and see what matches your aptitude and inclination. There are lots of variations in Oracle development and DBA tools.

Which certification should I pursue?

Certification, well realistically that is good for resume and that is good for getting a job. But don't think for a moment that it will prepare you for the job or would make you technically superior. I would even suggest to forget about the certification earlier in your career. Nobody is going to hand over their critical system having data of organization to you on the basis of a certification. You will be judged by the knowledge you would show during your interview and during initial days of your job.

Any Institute which offers good Oracle education at nominal rates?

NO. Sorry but that is what I have found out so far. And frankly speaking you don't need any course in any of institution. Except if you are attending some course from Oracle University Pakistan as prerequisite for the certification. If you want to learn Oracle, then do this:

Download the software for the product you want to learn.
Download its documentation from otn.oracle.com.
You can also look at Oracle By Example (OBE) on otn.oracle.com.
Follow the docs and start playing with that product.
Join forums at OTN and read the threads.
Read lots of blogs about your Oracle products.
Try getting a job as internee and get hands-on experience.

Got more questions? Ask in comments.

Monday, April 11, 2011

RMAN is Smart, But...

Restoration and recovery are the basic tasks which a DBA might perform regularly. The time of restoration and recovery is highly tensed and critical. Despite of good, available, and tested backups, it's easy to get carried away and mix up the things with RMAN. Trying different restores and recovers with SCN, and other option can lead to confusion or worst towards an un-successful recovery.

The best way is to cool down, and plan out as what exactly needs to be recovered, and then only ask that from RMAN. RMAN is smart, but its extra smart and we need to be sure to ask what exactly we need, avoiding ocean of information.

This short simple demo presents the full restore/recover of a database through RMAN to just show that its always clean and helpful to know that until how long
we can recover and then ordering just that to RMAN.

[oracle@stage dbs]$ sqlplus '/ as sysdba'

SQL*Plus: Release 11.2.0.1.0 Production on Sat Feb 12 22:01:16 2011

Copyright (c) 1982, 2009, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> startup
ORA-01078: failure in processing system parameters
LRM-00109: could not open parameter file '/u01/app/oracle/product/11.2.0/db_1/dbs/initorcl.ora'
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
[oracle@stage dbs]$ rman target /

Recovery Manager: Release 11.2.0.1.0 - Production on Sat Feb 12 22:08:19 2011

Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.

connected to target database (not started)

RMAN> startup nomount

startup failed: ORA-01078: failure in processing system parameters
LRM-00109: could not open parameter file '/u01/app/oracle/product/11.2.0/db_1/dbs/initorcl.ora'

starting Oracle instance without parameter file for retrieval of spfile
Oracle instance started

Total System Global Area 159019008 bytes

Fixed Size 1335192 bytes
Variable Size 75497576 bytes
Database Buffers 79691776 bytes
Redo Buffers 2494464 bytes

RMAN> restore spfile to pfile '/u01/app/oracle/product/11.2.0/db_1/dbs/initorcl.ora' from '/d01/backup/ORCL_1';

Starting restore at 12-FEB-11
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=19 device type=DISK

channel ORA_DISK_1: restoring spfile from AUTOBACKUP /d01/backup/ORCL_1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 02/12/2011 22:17:57
ORA-19687: SPFILE not found in backup set

RMAN> restore spfile to pfile '/u01/app/oracle/product/11.2.0/db_1/dbs/initorcl.ora' from '/d01/backup/ORCL_2';

Starting restore at 12-FEB-11
using channel ORA_DISK_1

channel ORA_DISK_1: restoring spfile from AUTOBACKUP /d01/backup/ORCL_2
channel ORA_DISK_1: SPFILE restore from AUTOBACKUP complete
Finished restore at 12-FEB-11

RMAN> startup force nomount pfile='/u01/app/oracle/product/11.2.0/db_1/dbs/initorcl.ora';

Oracle instance started

Total System Global Area 564957184 bytes

Fixed Size 1337916 bytes
Variable Size 360711620 bytes
Database Buffers 197132288 bytes
Redo Buffers 5775360 bytes

RMAN> restore controlfile from '/d01/backup/ORCL_2';

Starting restore at 12-FEB-11
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=19 device type=DISK

channel ORA_DISK_1: restoring control file
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
output file name=/u01/app/oracle/oradata/orcl/control01.ctl
output file name=/u01/app/oracle/flash_recovery_area/orcl/control02.ctl
Finished restore at 12-FEB-11

RMAN> alter database mount;

database mounted
released channel: ORA_DISK_1

RMAN>

RMAN> list backup;

List of Backup Sets
===================

BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
1 Full 1.03G DISK 00:01:17 12-FEB-11
   BP Key: 1 Status: AVAILABLE Compressed: NO Tag: TAG20110212T215413
   Piece Name: /d01/backup/ORCL_1
  List of Datafiles in backup set 1
  File LV Type Ckp SCN Ckp Time Name
  ---- -- ---- ---------- --------- ----
  1 Full 806024 12-FEB-11 /u01/app/oracle/oradata/orcl/system01.dbf
  2 Full 806024 12-FEB-11 /u01/app/oracle/oradata/orcl/sysaux01.dbf
  3 Full 806024 12-FEB-11 /u01/app/oracle/oradata/orcl/undotbs01.dbf
  4 Full 806024 12-FEB-11 /u01/app/oracle/oradata/orcl/users01.dbf
  5 Full 806024 12-FEB-11 /u01/app/oracle/oradata/orcl/example01.dbf

RMAN> list backup summary;

List of Backups
===============
Key TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- --------------- ------- ------- ---------- ---
1 B F A DISK 12-FEB-11 1 1 NO TAG20110212T215413

RMAN> restore database;

Starting restore at 12-FEB-11
Starting implicit crosscheck backup at 12-FEB-11
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=19 device type=DISK
Crosschecked 1 objects
Finished implicit crosscheck backup at 12-FEB-11

Starting implicit crosscheck copy at 12-FEB-11
using channel ORA_DISK_1
Finished implicit crosscheck copy at 12-FEB-11

searching for all files in the recovery area
cataloging files...
no files cataloged

using channel ORA_DISK_1

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00001 to /u01/app/oracle/oradata/orcl/system01.dbf
channel ORA_DISK_1: restoring datafile 00002 to /u01/app/oracle/oradata/orcl/sysaux01.dbf
channel ORA_DISK_1: restoring datafile 00003 to /u01/app/oracle/oradata/orcl/undotbs01.dbf
channel ORA_DISK_1: restoring datafile 00004 to /u01/app/oracle/oradata/orcl/users01.dbf
channel ORA_DISK_1: restoring datafile 00005 to /u01/app/oracle/oradata/orcl/example01.dbf
channel ORA_DISK_1: reading from backup piece /d01/backup/ORCL_1
channel ORA_DISK_1: piece handle=/d01/backup/ORCL_1 tag=TAG20110212T215413
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:01:06
Finished restore at 12-FEB-11

RMAN> recover database;

Starting recover at 12-FEB-11
using channel ORA_DISK_1

starting media recovery

unable to find archived log
archived log thread=1 sequence=4
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 02/12/2011 22:29:55
RMAN-06054: media recovery requesting unknown archived log for thread 1 with sequence 4 and starting SCN of 806024

RMAN> run
2> {
3> set until scn 806023;
4> recover database;
5> }

executing command: SET until clause

Starting recover at 12-FEB-11
using channel ORA_DISK_1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 02/12/2011 22:31:23
RMAN-06556: datafile 1 must be restored from backup older than SCN 806023

RMAN> run
2> {
3> set until scn 806024;
4> recover database;
5> }

executing command: SET until clause

Starting recover at 12-FEB-11
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:00:01

Finished recover at 12-FEB-11

RMAN> alter database open resetlogs;

database opened

RMAN> exit

Recovery Manager complete.
[oracle@stage dbs]$ sqlplus '/ as sysdba'

SQL*Plus: Release 11.2.0.1.0 Production on Sat Feb 12 22:50:54 2011

Copyright (c) 1982, 2009, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> select instance_name,status from v$instance;

INSTANCE_NAME STATUS
---------------- ------------
orcl OPEN

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
[oracle@stage dbs]$

So, the better way is to fisrt get the SCN at the mount stage by following query, and then restore/recover together in RUN block after setting until that SCN:

select group#, first_change#, status, archived from v$log;

Thursday, April 7, 2011

Monitoring Exadata cell offload via SQL Monitoring

The storage server in the cell nodes of Exadata database machine very well understands various predicates of SQL and take decisions to return data accordingly.

In its bid to return data intelligently to make the process faster, the storage server software utilize smart scan, storage indexes, rman block scanning and other cool features. Generally speaking cell offload means that database server at the compute node offloads its task of finding the exact data to be returned to user to the storage server at cell node.

If you would like to see that how much data related to your query has been offloaded to the cell node then you can utilize the SQL monitor:

SQL> set lines 100
SQL> set pages 100
SQL> set long 10000000
set longchunksize 10000000
set linesize 200
select dbms_sqltune.report_sql_monitor from dual;
SQL> SQL> SQL>
REPORT_SQL_MONITOR
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL Monitoring Report

SQL Text
------------------------------

Global Information
------------------------------
Status : EXECUTING
Instance ID : 1
Session : test (944:5510)
SQL ID : 55555555555
SQL Execution ID : 444
Execution Started : 04/04/2011 07:45:37
First Refresh Time : 04/04/2011 07:45:38
Last Refresh Time : 04/04/2011 07:46:11
Duration : 35s
Module/Action : test
Service : test
Program : test
DOP Downgrade : 88%

SQL Plan Monitoring Details (Plan Hash Value=560085029)
========================================================================================================================================================================================================
=
| Id | Operation | Name | Rows | Cost | Time | Start | Execs | Rows | Read | Read | Cell | Mem | Activity | Activity Detail
|
| | | | (Estim) | | Active(s) | Active | | (Actual) | Reqs | Bytes | Offload | | (%) | (# samples)
|
========================================================================================================================================================================================================
=
| 0 | SELECT STATEMENT | | | | | | 17 | | | | | | |
|
| 1 | PX COORDINATOR | | | | | | 17 | | | | | | |
|
| 2 | PX SEND QC (RANDOM) | :TQ10001 | 1 | 27335 | | | 16 | | | | | | |
|
| -> 3 | HASH JOIN | | 1 | 27335 | 33 | +3 | 16 | 0 | | | | 22M | |
|
| 4 | BUFFER SORT | | | | 1 | +3 | 16 | 26112 | | | | | |
|
| 5 | PX RECEIVE | | 5311 | 102 | 1 | +3 | 16 | 26112 | | | | | |
|
| 6 | PX SEND BROADCAST | :TQ10000 | 5311 | 102 | 1 | +2 | 1 | 26112 | | | | | |
|
| 7 | TABLE ACCESS STORAGE FULL | test1 | 5311 | 102 | 1 | +2 | 1 | 1632 | | | | | |
|
| -> 8 | PX BLOCK ITERATOR | | 511M | 27216 | 33 | +3 | 16 | 3M | | | | | 0.18 | Cpu (1)
|
| -> 9 | TABLE ACCESS STORAGE FULL | test2 | 511M | 27216 | 35 | +1 | 7774 | 3M | 79030 | 73GB | 84.78% | | 99.82 | Cpu (125)
|
| | | | | | | | | | | | | | | cell smart table scan (434)
|
=======================================================================================================================================================================

Wednesday, April 6, 2011

PX Deq Credit: send blkd Tuning

Parallelism is near to silver bullet for many of the database operation on huge volume of data. But as is the case with bullets, you have to handle them with care otherwise they could backfire.

PX Deq Credit: send blkd is a wait event you might see in your AWR report if you are running parallel queries. The event PX Deq Credit: send blkd happens when the parallel slaves compete for sending results back to the QC process. The competing slaves time out after 2 seconds. From the AWR reprot, get the avg wait time for PX Deq Credit: send blkd for the parallel slaves and see if it is less than 2 seconds? If it is then probably there is nothing you can much except controlling your parallelism in your application because heavy chances are that you are maxing out on your CPU.

If that is not the case, and your average wait time is higher than 2 seconds, and that is happening frequently and you are watching same patterns again and again in AWR or ASH reports during the high load on the system when the performance is not acceptable, and you believe that you have ample CPU then increase the size of PARALLEL_EXECUTION_MESSAGE_SIZE parameter and by also keeping the space in large_pool in perspective.

Tuesday, April 5, 2011

11g SQL Monitoring , A Must Monitoring Tool for Exadata

Exadata is not for the faint-heart databases. When there are some dense, complicated, parallel and long executions of SQLs upon huge volumes of data are required, Exadata is the way to go. It's compute nodes, cell nodes, IB switches, storage server software, database server software create an astounding synergy to provide a unique fastest experience.

Most of the queries which run on the Exadata are long and parallel in nature. Oracle 11g provides a sumptuous feature of SQL Monitoring to monitor any statements which are parallel or consume more than 5 secs in IO or at the CPU. Given the workload of Exadata, SQL Monitoring is so natural a tool to monitor your application in real time.

SQL Monitoring can be savored through OEM or through the SQL*Plus. For example to generate SQL Monitoring report in SQL Plus do the following:

set long 10000000
set longchunksize 10000000
set linesize 200
select dbms_sqltune.report_sql_monitor from dual;

Query Example Ref: Oracle Docs

You may also generate a cool graphical ACTIVE HTML report of your query:

set long 10000000
set longchunksize 10000000
set linesize 200

spool /tmp/sql_ACTIVE.html
select dbms_sqltune.report_sql_monitor(report_level=>'+histogram', type=>'ACTIVE', sql_id=>'') monitor_report from dual;
spool off

You can put the above script in scheduler or cron job and monitor the query by

scheduling the job at regular intervals. This way you can get a nice, cute, and

genuine life history of any SQL, as SQL monitoring shows how many parallel slaves

were used, and how much time was spent at which steps in the execution plan, and

it also shows the workload of the parallel slaves.

Monday, April 4, 2011

Expect Exadata Performance Decline After Cell Reboot

Well, just a note, that if you experience dead slow performance after the reboot of Exadata database machine, then it is most likely that the storage indexes are being rebuilt by the cell software upon the first access of the data.

Instead of logging an SR, it would be good idea to wait a little and then check later. Or make it part of SOP to run smoke test queries on the tables after the restart, to let cell software rebuild storage indexes, and then put the Exadata into the production.

Sunday, April 3, 2011

How to Gracefully Reboot Exadata Database Machine

Restarting Oracle Exadata database machine involves rebooting first the cell nodes and then the compute nodes. But before that you have to bring down the RDBMS instances, ASM instances, and CRS.

Steps to Restart Exadata:

1- With oracle user, Shutdown the RDBMS and ASM instances by doing From any compute node:

srvctl stop database -d

2- With root user, from $ORA_CRS_HOME, shutdown the CRS:

./crsctl stop cluster

3- Now go to all the cell nodes, and perform following to reboot the cell nodes:

$ reboot

If you have ILOM, you can view the progress of restart through its interface.

ssh

/SP/start console

4- After reboot of cell nodes, go to every compute node and perform following with root user:

$ reboot

You can check the status of the cell node by doing:

cellcli> list cell detail
cellcli> list celldisk
cellcli> list griddisk

5- As the compute nodes come up, the CRS would start automatically and you just have to start the database, if its not started by the CRS with oracle user.

srvctl start database -d

6- You may check the status of cluster resources from root user:

./crsctl stat res -t

Fahd Mirza on AI, Cloud, DevOps and Databases