Daily DBA Monitoring Tasks

Introduction

Database monitoring is a critical part of any application’s maintenance. Finding database issues in time can help the application remain healthy and accessible. Without solid monitoring in place, database outages can go unnoticed until it’s too late and the business is losing money and customers.

Like any operational monitoring, databases can be proactively or reactively monitored, with proactive monitoring favored by most people. Proactive monitoring seeks to identify any issues before they become major problems. It’s done by looking at database metrics and alerting teams or individuals when values are abnormal.

Reactive database monitoring is done after an incident occurs. It’s usually done for security breach investigation, performance troubleshooting, or major incident reporting.

Monitoring Tools

There are plenty of monitoring tools out there on the market both free and commercial.

Some of the database monitoring tools includes , but not limited to:

Below, We shall see the list of daily tasks that the DBA should monitor in oracle database.

Instance Availability

  • Create an SSH connection using either Putty, Xshell or Mobaxterm into all database servers (Production, Standby, UAT, SIT).
Daily DBA Monitoring Tasks
  • Check if process monitor is running:
oracle@dbsrv1:~$ ps -ef | grep pmon
    grid  2370     1   0   Nov 01 ?           7:59 asm_pmon_+ASM1
  oracle 15682     1   0   Nov 01 ?          13:29 ora_pmon_ProdDB
  oracle 14265 13802   0 11:10:51 pts/2       0:00 grep pmon
oracle@dbsrv1:~$
  • Check if system monitor is running:
oracle@dbsrv1:~$ ps -ef | grep smon
    grid  2411     1   0   Nov 01 ?           1:25 asm_smon_+ASM1
    root  2449     1   0   Nov 01 ?         470:42 /u01/app/12.1.0/grid/bin/osysmond.bin
  oracle 15744     1   0   Nov 01 ?           0:56 ora_smon_ProdDB
  oracle 14338 13802   0 11:13:15 pts/2       0:00 grep smon
oracle@dbsrv1:~$
  • Check if TNS process is running:
oracle@dbsrv1:~$ ps -ef | grep tns
    grid  2765     1   0   Nov 01 ?           1:55 /u01/app/12.1.0/grid/bin/tnslsnr MGMTLSNR -no_crs_notify -inherit
    grid  3021     1   0   Nov 01 ?          12:06 /u01/app/12.1.0/grid/bin/tnslsnr LISTENER -no_crs_notify -inherit
    grid  3031     1   0   Nov 01 ?           1:53 /u01/app/12.1.0/grid/bin/tnslsnr LISTENER_SCAN3 -no_crs_notify -inherit
    grid  3010     1   0   Nov 01 ?           1:54 /u01/app/12.1.0/grid/bin/tnslsnr LISTENER_SCAN1 -no_crs_notify -inherit
    grid  3040     1   0   Nov 01 ?           1:51 /u01/app/12.1.0/grid/bin/tnslsnr LISTENER_SCAN2 -no_crs_notify -inherit
  oracle 15303 13802   0 11:27:18 pts/2       0:00 grep tns
oracle@dbsrv1:~$
  • Check Cluster status, if the database is based on ASM storage/RAC database:
grid@dbsrv1:~$ crsctl status res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       dbsrv1                STABLE
ora.FRA.dg
               ONLINE  ONLINE       dbsrv1                STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       dbsrv1                STABLE
ora.asm
               ONLINE  ONLINE       dbsrv1                Started,STABLE
ora.net1.network
               ONLINE  ONLINE       dbsrv1                STABLE
ora.ons
               ONLINE  ONLINE       dbsrv1                STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       dbsrv1                STABLE
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       dbsrv1                STABLE
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       dbsrv1                STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       dbsrv1                159.234.34.203,STABL
                                                             E
ora.cvu
      1        ONLINE  ONLINE       dbsrv1                STABLE
ora.mgmtdb
      1        OFFLINE OFFLINE                               Instance Shutdown,ST
                                                             ABLE
ora.dbsrv1.vip
      1        ONLINE  ONLINE       dbsrv1                STABLE
ora.dbsrv2.vip
      1        ONLINE  INTERMEDIATE dbsrv1                FAILED OVER,STABLE
ora.oc4j
      1        ONLINE  ONLINE       dbsrv1                STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       dbsrv1                STABLE
ora.scan2.vip
      1        ONLINE  ONLINE       dbsrv1                STABLE
ora.scan3.vip
      1        ONLINE  ONLINE       dbsrv1                STABLE
ora.test.db
      1        OFFLINE OFFLINE                               Instance Shutdown,ST
                                                             ABLE
      2        OFFLINE OFFLINE                               STABLE
ora.ProdDB.db
      1        OFFLINE OFFLINE                               STABLE
      2        OFFLINE OFFLINE                               STABLE
--------------------------------------------------------------------------------
grid@dbsrv1:~$ 

  • Check  ASM status:
grid@dbsrv1:~$ srvctl status ASM
ASM is running on dbsrv1
grid@dbsrv1:~$
  • Check Listener status:
grid@dbsrv1:~$ srvctl status listener
Listener LISTENER is enabled
Listener LISTENER is running on node(s): dbsrv1
grid@dbsrv1:~$
  • Check database status:
grid@dbsrv1:~$ srvctl status database -d ProdDB
Instance ProdDB1 is running on node dbsrv1
Instance ProdDB2 is running on node dbsrv2
grid@dbsrv1:~$

OR

oracle@dbsrv1:~$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Wed Dec 21 11:46:56 2022

Copyright (c) 1982, 2014, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management,                                                                                         OLAP,
Advanced Analytics and Real Application Testing options

SQL> select status, database_status from v$instance;

STATUS       DATABASE_STATUS
------------ -----------------
OPEN         ACTIVE

SQL>

Monitoring alert log entries

Chronological log of messages and errors are written out in alert log file. Typical messages found in this file are: database startup, shutdown, log switches, space errors etc.

oracle@dbsrv1:/u01/app/oracle/diag/rdbms/Prodb/Prodb1/trace$ tail -f alert__Prodb1.log
    ns main err code: 12535

TNS-12535: TNS:operation timed out
    ns secondary err code: 12560
    nt main err code: 505

TNS-00505: Operation timed out
    nt secondary err code: 145
    nt OS err code: 0
  Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.10.54.27)(PORT=38873))

If there is any error or issues in the database, please refer to the alert log files.

Checking Available Space

The DBA as part of the daily monitoring is required to check if there are enough Disk Space, FRA, Tablespace, and Tape Size.

  • Disk Space:  use a df –h Solaris command to check if there is enough space in root pool, u01, share, var, and backup disk in all database (production instance1, production instance2, standby, card db, Recon).
oracle@dbsrv1:~$ df -h
Filesystem             Size   Used  Available Capacity  Mounted on
rpool/ROOT/solaris-1   294G    16G       173G     9%    /
/devices                 0K     0K         0K     0%    /devices
/dev                     0K     0K         0K     0%    /dev
ctfs                     0K     0K         0K     0%    /system/contract
proc                     0K     0K         0K     0%    /proc
mnttab                   0K     0K         0K     0%    /etc/mnttab
swap                    39G   2.6M        39G     1%    /system/volatile
objfs                    0K     0K         0K     0%    /system/object
sharefs                  0K     0K         0K     0%    /etc/dfs/sharetab
fd                       0K     0K         0K     0%    /dev/fd
rpool/ROOT/solaris-1/var
                       294G   497M       173G     1%    /var
swap                    39G    12M        39G     1%    /tmp
rpool/VARSHARE         294G   6.1G       173G     4%    /var/share
rpool/export           294G   320K       173G     1%    /export
rpool/export/home      294G   336K       173G     1%    /export/home
rpool/export/home/grid
                       294G   400K       173G     1%    /export/home/grid
rpool/export/home/oracle
                       294G    13G       173G     8%    /export/home/oracle
rpool/export/home/sol
                       294G   307M       173G     1%    /export/home/sol
rpool                  294G   384K       173G     1%    /rpool
rpool/VARSHARE/zones   294G   288K       173G     1%    /system/zones
rpool/VARSHARE/pkg     294G   304K       173G     1%    /var/share/pkg
rpool/VARSHARE/pkg/repositories
                       294G   288K       173G     1%    /var/share/pkg/repositories
u01                    147G   111G        35G    76%    /u01
oracle@dbsrv1:~$
  • ASMCMD: Is a command-line utility that you can use to easily view and manipulate files and directories within Automatic Storage Management (ASM) disk groups. It can list the contents of disk groups, perform searches, create and remove directories and aliases, display space utilization, and more.
grid@dbsrv1:~$ asmcmd
ASMCMD> lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576   6289446  1189784                0         1189784              0             N  DATA/
MOUNTED  EXTERN  N         512   4096  1048576    511591   298274                0          298274              0             Y  FRA/
ASMCMD> ls
DATA/
FRA/
ASMCMD>
  • v$asm_diskgroup: we can also use v$asm_diskgroup view object to manage ASM disk spaces.
Daily DBA Monitoring Tasks
  • Tablespace: To get the amount of free space in table space uses the DBA_DATA_FILES and DBA_FREE_SPACE. If the amount of table space used has passed the threshold add space. To decide the threshold value you have to make get the daily growth of the database. This helps to proactively manage and monitor the space of tablespaces. 
Daily DBA Monitoring Tasks

Check CPU and Memory Usage

On oracle Solaris use the TOP Command to monitor CPU Usage and free memory.

oracle@dbsrv1:~$ top
load averages:  39.8,  34.9,  32.2;               up 39+02:24:19                                                                                               14:32:33
753 processes: 712 sleeping, 1 running, 1 stopped, 39 on cpu
CPU states: 17.9% idle, 75.5% user,  6.6% kernel,  0.0% iowait,  0.0% swap
Kernel: 182784 ctxsw, 50804 trap, 135011 intr, 90514 syscall, 1229 flt
Memory: 64G phys mem, 2350M free mem, 48G total swap, 43G free swap

   PID USERNAME NLWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
  7172 oracle      1  30    0   38G   38G cpu/40   3:48  2.11% oracle
  7180 oracle      1   0    0   38G   38G cpu/36   3:45  2.11% oracle
  7201 oracle      1   0    0   38G   38G cpu/4    3:09  2.09% oracle
  7205 oracle      1   0    0   38G   38G cpu/23   3:05  2.09% oracle
  7141 oracle      1  40    0   38G   38G cpu/34   4:28  2.09% oracle
  7167 oracle      1  50    0   38G   38G cpu/47   4:06  2.08% oracle

User Monitoring

Daily the DBA must check all user account that should be open must be and account that should be locked must be locked. It’s the job of the DBA to change System Users before they are expired.

We can use the SQL script below to monitor the status of database users:

Select Username,account_status,expiry_date,default_tablespace  
from dba_users  where default_tablespace !='USERS';

Invalid Objects

The DBA should look up if any invalid objects exists and take proper measure to validate the objects founds.

  • To check invalid objects:
SQL> select object_name, status from dba_objects where status’INVADLID’;

And use the ALTER…COMPILE SQL command to fix invalid objects.

Clearing File system and FRA files

Before the server runs low on disk space, the DBA must clear disk spaces by removing files that are no longer necessary.

  • Remove old FRA archive logs:
---Connect through RMAN
oracle@dbsrv1:~$ rman

Recovery Manager: Release 12.1.0.2.0 - Production on Wed Dec 21 14:44:02 2022

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

RMAN> connect target /

connected to target database: ProdDB(DBID=66272031)

--- List archive logs older than three days.
RMAN> list archivelog all completed before 'sysdate-3';

---Delete archive log older than three days.
RMAN> delete archivelog all completed before 'sysdate-3';

---Crosscheck archive log 
RMAN> crosscheck archivelog all;

---Remove expired archive log 
RMAN> delete expired archivelog all;