Getting started with tuning your SOA/BPM database using AWR

Update: When I initially published this post, I was relying on information from a single source inside Oracle, however since publishing it, I have been discussing the content further with other sources in the Oracle community, and in the course of doing so, have identified some improvements and updated the post to reflect them. I will continue to update this post as better information comes to hand, and to make as clear, balanced and accurate as possible.

Special thanks to Jacco H. Landlust, Software Architect, Web-DBA and Oracle ACE for his highly valuable input.

In order to continue to get good performance from your SOA or BPM 11g server, you will want to periodically check your database – the one you are storing your SOAINFRA schema in – to see if there are any performance issues there. You need to keep doing this, as the workload changes and the usage of space in the database changes. Depending on the volume of traffic going through your system, you might want to think about tuning the database every week or every month for example.

Tuning an Oracle database is a specialist task. It requires a highly trained and experienced DBA to do it well. It is not the kind of thing that you can learn from a short blog post, like this one for example. This post is not intended to teach you how to tune an Oracle database, but rather to just give a few pointers that might help your DBA, or that you can experiment with in your own environment if you don’t have the services of a good DBA.

If you are lucky enough to have a good DBA running your SOAINFRA database, then they will probably already know how to use AWR to tune and Oracle database. If this is the case, you should just let them know that common issues in SOA/BPM databases are SGA sizing, statistics, missing indexes and high watermark contention. They should know what to do with that information.

If, however you do not have a good DBA managing your database, perhaps you only have the database because it is needed for SOA/BPM, and it is being managed by a middleware-style systems administrator, then you might want to read on… but please keep in mind that this advise is not intended to replace the need for a well trained specialist. You should probably try to get a DBA on staff, or contract, to keep your database performing well.

This article provides a very brief introduction to the use of the Automatic Workload Repository (AWR) in the Oracle Database and what to look for in the reports for your SOA/BPM environment.

Before you start playing with AWR, it is a good idea to go and read a bit about it. A good place to start would be Overview of the Automatic Workload Repository and Managing the Automatic Workload Repository. You should pay particular attention to making sure you develop an understanding of the concept of ‘DB TIME,’ without which extracting much meaning from AWR reports will be difficult.

AWR is a built in feature of the Oracle Database. Your database will automatically collect performance information and create snapshots every hour. It will also automatically age and remove these over time.

You can also tell the database to take a snapshot manually using this command, which you will need to issue as a SYSDBA user in SQLPlus:

SELECT DBMS_WORKLOAD_REPOSITORY.Create_Snapshot FROM DUAL;

So the process is as follows:

Create a snapshot (using the SQL above),
Run your tests,
Create another snapshot.

Your tests should be some kind of representative and repeatable workload if you are doing this in a test environment.

It is also safe to run these reports against your production environment. In this case, you need not create the snapshots manually, you can just use the hourly ones that the database creates for you automatically.

Once you have your snapshots, you are ready to create a report. You use the following command, again as a SYSDBA, to create the report:

@?/rdbms/admin/awrrpt.sql

This will ask you to select the start and end snapshots and for other details like the format and file name for the output.

After you have done this, open up your report and take a look. Be warned – it is a pretty big report.

Here is an example of the first page of the report, this is from a VM with BPM 11.1.1.5 plus the Feature Pack, running on Oracle Linux 5, with 10GB of memory, everything in the one VM – so not an ideal production environment, which is good, because we should be able to see some issues in the report.


WORKLOAD REPOSITORY report for

DB Name         DB Id    Instance     Inst Num Startup Time    Release     RAC
------------ ----------- ------------ -------- --------------- ----------- ---
ORCL          1292287891 orcl                1 24-Nov-11 11:03 11.2.0.1.0  NO

Host Name        Platform                         CPUs Cores Sockets Memory(GB)
---------------- -------------------------------- ---- ----- ------- ----------
bpmfp.mark.oracl Linux x86 64-bit                    8     8       2       9.78

              Snap Id      Snap Time      Sessions Curs/Sess
            --------- ------------------- -------- ---------
Begin Snap:       468 24-Nov-11 11:27:20        41       5.0
  End Snap:       469 24-Nov-11 11:33:44        42       6.8
   Elapsed:                6.41 (mins)
   DB Time:                0.42 (mins)

Cache Sizes                       Begin        End
~~~~~~~~~~~                  ---------- ----------
               Buffer Cache:       252M       252M  Std Block Size:         8K
           Shared Pool Size:       396M       396M      Log Buffer:     5,424K

Load Profile              Per Second    Per Transaction   Per Exec   Per Call
~~~~~~~~~~~~         ---------------    --------------- ---------- ----------
      DB Time(s):                0.1                0.0       0.00       0.00
       DB CPU(s):                0.0                0.0       0.00       0.00
       Redo size:           36,700.2            7,761.1
   Logical reads:              244.9               51.8
   Block changes:              158.5               33.5
  Physical reads:                1.2                0.3
 Physical writes:                3.6                0.8
      User calls:              242.8               51.3
          Parses:               33.9                7.2
     Hard parses:                0.5                0.1
W/A MB processed:                0.1                0.0
          Logons:                0.1                0.0
        Executes:               69.1               14.6
       Rollbacks:                0.8                0.2
    Transactions:                4.7

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Buffer Nowait %:   99.82       Redo NoWait %:  100.00
            Buffer  Hit   %:   99.52    In-memory Sort %:  100.00
            Library Hit   %:   98.63        Soft Parse %:   98.60
         Execute to Parse %:   50.96         Latch Hit %:   98.16
Parse CPU to Parse Elapsd %:   66.67     % Non-Parse CPU:   97.75

 Shared Pool Statistics        Begin    End
                              ------  ------
             Memory Usage %:   41.73   43.63
    % SQL with executions>1:   85.59   85.23
  % Memory for SQL w/exec>1:   78.53   80.89

Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                           Avg
                                                          wait   % DB
Event                                 Waits     Time(s)   (ms)   time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
DB CPU                                               15          59.9
log file sync                         1,592           8      5   32.3 Commit
sort segment request                      1           1   1001    4.0 Configurat
db file sequential read                 216           1      4    3.6 User I/O
db file scattered read                   64           0      6    1.5 User I/O
Host CPU (CPUs:    8 Cores:    8 Sockets:    2)
~~~~~~~~         Load Average
               Begin       End     %User   %System      %WIO     %Idle
           --------- --------- --------- --------- --------- ---------
                0.11      0.13       3.3       0.5       0.4      95.8

Instance CPU
~~~~~~~~~~~~
              % of total CPU for Instance:       0.5
              % of busy  CPU for Instance:      12.9
  %DB time waiting for CPU - Resource Mgr:       0.0

Memory Statistics
~~~~~~~~~~~~~~~~~                       Begin          End
                  Host Mem (MB):     10,017.1     10,017.1
                   SGA use (MB):        668.0        668.0
                   PGA use (MB):         87.9         94.0
    % Host Mem used for SGA+PGA:         7.55         7.61

Tips for SOA/BPM database tuning

Here are some specific areas to check. Please keep in mind that these are specifically for the SOAINFRA database, and would not necessarily apply to any other workloads. Also, remember that there is not really any globally applicable set of settings that will work for everyone. These are just some guidelines – if you are serious about tuning your database, you need to get a good DBA to do it.

Redo logs

There will normally be a lot of redo activity on the SOA database. You need to make sure your redo logs are ‘large enough.’ One (simplistic) way to do this is to check the number of log switches. When the system is running at peak workload, one log switch every twenty minutes is ideal, more than this is too high and you should make the redo logs larger to reduce the number of switches. Your DBA will know better ways to tune the redo log size.

If you are using ‘plain old-fashioned’ disks in your server, as opposed to a SAN or ASM, you should place your redo logs on a different disk to the database files. You should probably also consider moving to ASM and SAN storage if your workload justifies it.

You can find the log switches in the Instance Activity Stats part of the report, here is an example:

Instance Activity Stats - Thread Activity   DB/Inst: ORCL/orcl  Snaps: 468-469
-> Statistics identified by '(derived)' come from sources other than SYSSTAT

Statistic                                     Total  per Hour
-------------------------------- ------------------ ---------
log switches (derived)                            0       .00
-------------------------------------------------------------

You can see in this system there are no log switches, which is good. So this tells us the redo logs are large enough, or that we did not run for a long enough period of time to get any meaningful results – this report comes from a six minute test run.

Parsing

Check the hard parsing amount. It should be zero. If it is not, this could indicate that your SGA is too small. (It could also indicate other things.) You should try increasing the size of SGA and testing again. Hard parsing can be caused by use of literals in SQL (as opposed to bind variables).

If the queries in question are your own, e.g. in a database adapter, then you should consider changing them to use bind variables and retesting. Note that there are other approaches to addressing this issue, your DBA will be able to adivse you. Also, you probably should not have your own queries running in the same database that is hosting SOAINFRA, except perhaps if you are in a development environment.

You can find this information on the first page.

Load Profile              Per Second    Per Transaction   Per Exec   Per Call
~~~~~~~~~~~~         ---------------    --------------- ---------- ----------
...
          Parses:               33.9                7.2
     Hard parses:                0.5                0.1
...

You can see in this system the hard parses is almost zero, which is good.

SGA

Check the buffer hit and library hit percentages. We want them to be 100%, if not you should increase the size of SGA. This is also on the first page:

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            Buffer Nowait %:   99.82       Redo NoWait %:  100.00
            Buffer  Hit   %:   99.52    In-memory Sort %:  100.00
            Library Hit   %:   98.63        Soft Parse %:   98.60
         Execute to Parse %:   50.96         Latch Hit %:   98.16
Parse CPU to Parse Elapsd %:   66.67     % Non-Parse CPU:   97.75

In this case they are also good.

You should be aware that the usefuleness or otherwise of the buffer hit ratio is a matter of some debate in Oracle circles. For an overview of the pro’s and con’s, please see this article by Richard Foote.

Top 5

Check the average wait times. Anything over 5ms indicates a problem. If you see database CPU events in the Top 5, this could potentially indicate that SGA is too small in some circumstances, but it may not be a problem at all. You may also be missing indexes. Check the optimizer statistics.

Here are the Top 5 from my environment:

Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                           Avg
                                                          wait   % DB
Event                                 Waits     Time(s)   (ms)   time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
DB CPU                                               15          59.9
log file sync                         1,592           8      5   32.3 Commit
sort segment request                      1           1   1001    4.0 Configurat
db file sequential read                 216           1      4    3.6 User I/O
db file scattered read                   64           0      6    1.5 User I/O

You can see here that the top event is DB CPU, which could potentially indicate that SGA is too small. However, in this case it does not. It is high because this report was run on a VM with the database and BPM sharing the CPU and disk, so the CPU was busy doing ‘other stuff’ like running BPM and WebLogic. Database activities like sorting and logical I/O (reading memory) also shows up as DB CPU.

Database file sequential/scattered read

These indicate time spent doing table scans and index scans (respectively). If these are high (over 5ms), you should consider moving your data files to reduce disk I/O contention, or move them to faster disks. You can see these values in the previous example too.

Enqueue high watermark

This indicates enqueue high watermark contention that occurs when there are multiple users inserting into LOB segments at once while the database is trying to reclaim unused space. You should consider enabling secure files to improve LOB performance (SECURE_FILES=ALWAYS). Note that you would have to do this before you run RCU to create the schemas. It is possible to move LOBs after creation, but this is not a procedure that a novice DBA should attempt (unless they are confident with backup and restore first). The procedure involves the use of the DBMS_REDEFINITION package.

You cannot see enqueue high watermark contention in my example report, because this was not a problem in my environment, so it did not make it into the Top 5. If it did, you would see an event called:

enq: HW - contention

Some other considerations…

There are some database configuration parameters that can have an impact on performance. The use or otherwise of these parameters is a matter of much debate.

If you are doing a performance benchmark, where your goal is to get the best possible performance, then you might want to consider not using MEMORY_TARGET and AUDIT_TRAIL. However, keep in mind that running a performance benchmark is a lot different to running a production system.

MEMORY_TARGET

This setting allows the database to automatically tune its own memory usage. If you do not use this setting, you will need to have your DBA tune the memory usage manually. There is an argument that a DBA manually tuning the database will result in a better tuned database. There is a counter argument though, that not many DBA’s have the time to sit around tuning the database constantly, and you might be better off letting the database do it itself. If you do not use this setting, you should start with 60% of physical memory allocated to SGA and 20% to PGA.

AUDIT_TRAIL

There is an argument that you should not use this setting if you are going for absolute best performance. However, the overhead is very low, and and benefit of having the audit trail will most likely outweight the slight performance cost in almost all situations.