Introduction
Database monitoring is a critical part of any application’s maintenance. Finding database issues in time can help the application remain healthy and accessible. Without solid monitoring in place, database outages can go unnoticed until it’s too late and the business is losing money and customers.
Like any operational monitoring, databases can be proactively or reactively monitored, with proactive monitoring favored by most people. Proactive monitoring seeks to identify any issues before they become major problems. It’s done by looking at database metrics and alerting teams or individuals when values are abnormal.
Reactive database monitoring is done after an incident occurs. It’s usually done for security breach investigation, performance troubleshooting, or major incident reporting.
Monitoring Tools
There are plenty of monitoring tools out there on the market both free and commercial.
Some of the database monitoring tools includes , but not limited to:
- Oracle Enterprise Manager Cloud Control 13C
- SQL*Plus
- PL/SQL Developer
- SQL Developer
- WinSCP
Below, We shall see the list of daily tasks that the DBA should monitor in oracle database.
Instance Availability
- Create an SSH connection using either Putty, Xshell or Mobaxterm into all database servers (Production, Standby, UAT, SIT).
- Check if process monitor is running:
oracle@dbsrv1:~$ ps -ef | grep pmon grid 2370 1 0 Nov 01 ? 7:59 asm_pmon_+ASM1 oracle 15682 1 0 Nov 01 ? 13:29 ora_pmon_ProdDB oracle 14265 13802 0 11:10:51 pts/2 0:00 grep pmon oracle@dbsrv1:~$
- Check if system monitor is running:
oracle@dbsrv1:~$ ps -ef | grep smon grid 2411 1 0 Nov 01 ? 1:25 asm_smon_+ASM1 root 2449 1 0 Nov 01 ? 470:42 /u01/app/12.1.0/grid/bin/osysmond.bin oracle 15744 1 0 Nov 01 ? 0:56 ora_smon_ProdDB oracle 14338 13802 0 11:13:15 pts/2 0:00 grep smon oracle@dbsrv1:~$
- Check if TNS process is running:
oracle@dbsrv1:~$ ps -ef | grep tns grid 2765 1 0 Nov 01 ? 1:55 /u01/app/12.1.0/grid/bin/tnslsnr MGMTLSNR -no_crs_notify -inherit grid 3021 1 0 Nov 01 ? 12:06 /u01/app/12.1.0/grid/bin/tnslsnr LISTENER -no_crs_notify -inherit grid 3031 1 0 Nov 01 ? 1:53 /u01/app/12.1.0/grid/bin/tnslsnr LISTENER_SCAN3 -no_crs_notify -inherit grid 3010 1 0 Nov 01 ? 1:54 /u01/app/12.1.0/grid/bin/tnslsnr LISTENER_SCAN1 -no_crs_notify -inherit grid 3040 1 0 Nov 01 ? 1:51 /u01/app/12.1.0/grid/bin/tnslsnr LISTENER_SCAN2 -no_crs_notify -inherit oracle 15303 13802 0 11:27:18 pts/2 0:00 grep tns oracle@dbsrv1:~$
- Check Cluster status, if the database is based on ASM storage/RAC database:
grid@dbsrv1:~$ crsctl status res -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.DATA.dg ONLINE ONLINE dbsrv1 STABLE ora.FRA.dg ONLINE ONLINE dbsrv1 STABLE ora.LISTENER.lsnr ONLINE ONLINE dbsrv1 STABLE ora.asm ONLINE ONLINE dbsrv1 Started,STABLE ora.net1.network ONLINE ONLINE dbsrv1 STABLE ora.ons ONLINE ONLINE dbsrv1 STABLE -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE dbsrv1 STABLE ora.LISTENER_SCAN2.lsnr 1 ONLINE ONLINE dbsrv1 STABLE ora.LISTENER_SCAN3.lsnr 1 ONLINE ONLINE dbsrv1 STABLE ora.MGMTLSNR 1 ONLINE ONLINE dbsrv1 159.234.34.203,STABL E ora.cvu 1 ONLINE ONLINE dbsrv1 STABLE ora.mgmtdb 1 OFFLINE OFFLINE Instance Shutdown,ST ABLE ora.dbsrv1.vip 1 ONLINE ONLINE dbsrv1 STABLE ora.dbsrv2.vip 1 ONLINE INTERMEDIATE dbsrv1 FAILED OVER,STABLE ora.oc4j 1 ONLINE ONLINE dbsrv1 STABLE ora.scan1.vip 1 ONLINE ONLINE dbsrv1 STABLE ora.scan2.vip 1 ONLINE ONLINE dbsrv1 STABLE ora.scan3.vip 1 ONLINE ONLINE dbsrv1 STABLE ora.test.db 1 OFFLINE OFFLINE Instance Shutdown,ST ABLE 2 OFFLINE OFFLINE STABLE ora.ProdDB.db 1 OFFLINE OFFLINE STABLE 2 OFFLINE OFFLINE STABLE -------------------------------------------------------------------------------- grid@dbsrv1:~$
- Check ASM status:
grid@dbsrv1:~$ srvctl status ASM ASM is running on dbsrv1 grid@dbsrv1:~$
- Check Listener status:
grid@dbsrv1:~$ srvctl status listener Listener LISTENER is enabled Listener LISTENER is running on node(s): dbsrv1 grid@dbsrv1:~$
- Check database status:
grid@dbsrv1:~$ srvctl status database -d ProdDB Instance ProdDB1 is running on node dbsrv1 Instance ProdDB2 is running on node dbsrv2 grid@dbsrv1:~$ OR oracle@dbsrv1:~$ sqlplus / as sysdba SQL*Plus: Release 12.1.0.2.0 Production on Wed Dec 21 11:46:56 2022 Copyright (c) 1982, 2014, Oracle. All rights reserved. Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Advanced Analytics and Real Application Testing options SQL> select status, database_status from v$instance; STATUS DATABASE_STATUS ------------ ----------------- OPEN ACTIVE SQL>
Monitoring alert log entries
Chronological log of messages and errors are written out in alert log file. Typical messages found in this file are: database startup, shutdown, log switches, space errors etc.
oracle@dbsrv1:/u01/app/oracle/diag/rdbms/Prodb/Prodb1/trace$ tail -f alert__Prodb1.log ns main err code: 12535 TNS-12535: TNS:operation timed out ns secondary err code: 12560 nt main err code: 505 TNS-00505: Operation timed out nt secondary err code: 145 nt OS err code: 0 Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.10.54.27)(PORT=38873))
If there is any error or issues in the database, please refer to the alert log files.
Checking Available Space
The DBA as part of the daily monitoring is required to check if there are enough Disk Space, FRA, Tablespace, and Tape Size.
- Disk Space: use a df –h Solaris command to check if there is enough space in root pool, u01, share, var, and backup disk in all database (production instance1, production instance2, standby, card db, Recon).
oracle@dbsrv1:~$ df -h Filesystem Size Used Available Capacity Mounted on rpool/ROOT/solaris-1 294G 16G 173G 9% / /devices 0K 0K 0K 0% /devices /dev 0K 0K 0K 0% /dev ctfs 0K 0K 0K 0% /system/contract proc 0K 0K 0K 0% /proc mnttab 0K 0K 0K 0% /etc/mnttab swap 39G 2.6M 39G 1% /system/volatile objfs 0K 0K 0K 0% /system/object sharefs 0K 0K 0K 0% /etc/dfs/sharetab fd 0K 0K 0K 0% /dev/fd rpool/ROOT/solaris-1/var 294G 497M 173G 1% /var swap 39G 12M 39G 1% /tmp rpool/VARSHARE 294G 6.1G 173G 4% /var/share rpool/export 294G 320K 173G 1% /export rpool/export/home 294G 336K 173G 1% /export/home rpool/export/home/grid 294G 400K 173G 1% /export/home/grid rpool/export/home/oracle 294G 13G 173G 8% /export/home/oracle rpool/export/home/sol 294G 307M 173G 1% /export/home/sol rpool 294G 384K 173G 1% /rpool rpool/VARSHARE/zones 294G 288K 173G 1% /system/zones rpool/VARSHARE/pkg 294G 304K 173G 1% /var/share/pkg rpool/VARSHARE/pkg/repositories 294G 288K 173G 1% /var/share/pkg/repositories u01 147G 111G 35G 76% /u01 oracle@dbsrv1:~$
- ASMCMD: Is a command-line utility that you can use to easily view and manipulate files and directories within Automatic Storage Management (ASM) disk groups. It can list the contents of disk groups, perform searches, create and remove directories and aliases, display space utilization, and more.
grid@dbsrv1:~$ asmcmd ASMCMD> lsdg State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED EXTERN N 512 4096 1048576 6289446 1189784 0 1189784 0 N DATA/ MOUNTED EXTERN N 512 4096 1048576 511591 298274 0 298274 0 Y FRA/ ASMCMD> ls DATA/ FRA/ ASMCMD>
- v$asm_diskgroup: we can also use v$asm_diskgroup view object to manage ASM disk spaces.
- Tablespace: To get the amount of free space in table space uses the DBA_DATA_FILES and DBA_FREE_SPACE. If the amount of table space used has passed the threshold add space. To decide the threshold value you have to make get the daily growth of the database. This helps to proactively manage and monitor the space of tablespaces.
Check CPU and Memory Usage
On oracle Solaris use the TOP Command to monitor CPU Usage and free memory.
oracle@dbsrv1:~$ top load averages: 39.8, 34.9, 32.2; up 39+02:24:19 14:32:33 753 processes: 712 sleeping, 1 running, 1 stopped, 39 on cpu CPU states: 17.9% idle, 75.5% user, 6.6% kernel, 0.0% iowait, 0.0% swap Kernel: 182784 ctxsw, 50804 trap, 135011 intr, 90514 syscall, 1229 flt Memory: 64G phys mem, 2350M free mem, 48G total swap, 43G free swap PID USERNAME NLWP PRI NICE SIZE RES STATE TIME CPU COMMAND 7172 oracle 1 30 0 38G 38G cpu/40 3:48 2.11% oracle 7180 oracle 1 0 0 38G 38G cpu/36 3:45 2.11% oracle 7201 oracle 1 0 0 38G 38G cpu/4 3:09 2.09% oracle 7205 oracle 1 0 0 38G 38G cpu/23 3:05 2.09% oracle 7141 oracle 1 40 0 38G 38G cpu/34 4:28 2.09% oracle 7167 oracle 1 50 0 38G 38G cpu/47 4:06 2.08% oracle
User Monitoring
Daily the DBA must check all user account that should be open must be and account that should be locked must be locked. It’s the job of the DBA to change System Users before they are expired.
We can use the SQL script below to monitor the status of database users:
Select Username,account_status,expiry_date,default_tablespace from dba_users where default_tablespace !='USERS';
Invalid Objects
The DBA should look up if any invalid objects exists and take proper measure to validate the objects founds.
- To check invalid objects:
SQL> select object_name, status from dba_objects where status’INVADLID’;
And use the ALTER…COMPILE SQL command to fix invalid objects.
Clearing File system and FRA files
Before the server runs low on disk space, the DBA must clear disk spaces by removing files that are no longer necessary.
- Remove old FRA archive logs:
---Connect through RMAN oracle@dbsrv1:~$ rman Recovery Manager: Release 12.1.0.2.0 - Production on Wed Dec 21 14:44:02 2022 Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved. RMAN> connect target / connected to target database: ProdDB(DBID=66272031) --- List archive logs older than three days. RMAN> list archivelog all completed before 'sysdate-3'; ---Delete archive log older than three days. RMAN> delete archivelog all completed before 'sysdate-3'; ---Crosscheck archive log RMAN> crosscheck archivelog all; ---Remove expired archive log RMAN> delete expired archivelog all;