Using Jumbo Frames on VMware NSX for Oracle Workloads

 

 

This blog is not a deep dive on VMware NSX or VXLAN concepts , this blog will focus on Oracle Real Application Cluster (RAC) Interconnect MTU sizing using Jumbo Frames on VMware NSX.

 

VMware NSX

 

VMware NSX Data Center is the network virtualization and security platform that enables the virtual cloud network, a software-defined approach to networking that extends across data centers, clouds, and application frameworks.

With NSX Data Center, networking and security are brought closer to the application wherever it’s running, from virtual machines (VMs) to containers to bare metal. Like the operational model of VMs, networks can be provisioned and managed independent of underlying hardware.

More details on VMware NSX can be found here.

 

 

Virtual eXtensible Local Area Network (VXLAN)

 

A blog by Vyenkatesh Deshpande, describe the different components of the VMware’s VXLAN implementation.

Important concepts about Unicast, Broadcast and Multicast can be found here.

VXLAN is an overlay network technology. Overlay network can be defined as any logical network that is created on top of the existing physical networks. VXLAN creates Layer 2 logical networks on top of the IP network.

Using VXLAN adds another 50 bytes of additional overhead for the Protocol.

On top the ICMP/ping implementation doesn’t encapsulate the 28-byte ICMP (8) + IP (20) header, so we must account for 28 Bytes.

 

 

VMware Cloud on AWS

 

VMware Cloud on AWS is an on-demand service that enables customers to run applications across vSphere-based cloud environments with access to a broad range of AWS services.

Powered by VMware Cloud Foundation, this service integrates vSphere, vSAN and NSX along with VMware vCenter management, and is optimized to run on dedicated, elastic, bare-metal AWS infrastructure. ESXi hosts in VMware Cloud on AWS reside in an AWS availability Zone (AZ) and are protected by vSphere HA.

The use case for deploying VMware Cloud on AWS are multi-fold namely

  • Data Center Extension & DR
  • Cloud Migration
  • Application modernization & Next-Generation Apps build out

More detail on VMware Cloud on AWS can be found here.

 

 

 

Oracle Net Services

 

Oracle Net, a component of Oracle Net Services, enables a network session from a client application to an Oracle Database server. When a network session is established, Oracle Net acts as the data courier for both the client application and the database.

Oracle Net communicates with TCP/IP to enable computer-level connectivity and data transfer between the client and the database.

More details on Oracle Net Services can be found here.

  

 

 

Oracle Real Application Cluster (RAC)

 

Non-cluster Oracle databases have a one-to-one relationship between the Oracle database and the instance. Oracle RAC environments, however, have a one-to-many relationship between the database and instances. An Oracle RAC database can have several instances, all which access one database. All database instances must use the same interconnect, which can also be used by Oracle Clusterware.

Oracle Clusterware is a portable cluster management solution that is integrated with Oracle Database. Oracle Clusterware is a required component for using Oracle RAC that provides the infrastructure necessary to run Oracle RAC.

More details on Oracle RAC can be found here.

 

 

Oracle Real Application Cluster (RAC) Interconnect

 

All nodes in an Oracle RAC environment must connect to at least one Local Area Network (LAN) (commonly referred to as the public network) to enable users and applications to access the database.

In addition to the public network, Oracle RAC requires private network connectivity used exclusively for communication between the nodes and database instances running on those nodes. This network is commonly referred to as the interconnect. The interconnect network is a private network that connects all the servers in the cluster.

More details on Oracle Real Application Cluster (RAC) Interconnect can be found here.

 

 

Oracle Real Application Cluster (RAC) Networking requirements

 

Oracle documentation for RAC –There are 2 main requirements , among others , with respect to Broadcast and Multicast traffic

  • Broadcast Requirements
    • Broadcast communications (ARP and UDP) must work properly across all the public and private interfaces configured for use by Oracle Grid Infrastructure.
    • The broadcast must work across any configured VLANs as used by the public or private interfaces
    • When configuring public and private network interfaces for Oracle RAC, you must enable Address Resolution Protocol (ARP). Highly Available IP (HAIP) addresses do not require ARP on the public network, but for VIP failover, you need to enable ARP. Do not configure NOARP.

 

  • Multicast Requirements
    • For each cluster member node, the Oracle mDNS daemon uses multicasting on all interfaces to communicate with other nodes in the cluster.
    • Multicasting is required on the private interconnect. For this reason, at a minimum, you must enable multicasting for the cluster
      • Across the broadcast domain as defined for the private interconnect
      • On the IP address subnet ranges 224.0.0.0/24 and optionally 230.0.1.0/24
    • You do not need to enable multicast communications across routers.

More information on Oracle RAC networking requirements can be found here.

 

 

 

Oracle Real Application Cluster using VMware NSX

 

Oracle Workloads , both Single Instance and RAC can seamless and transparently run on top of VMware NSX without any issues.

With Extended Oracle RAC , both Storage and Network virtualization needs to be deployed to provided high availability, workload Mobility, workload balancing and effective Site Maintenance between sites.

NSX supports multi-datacenter deployments to allow L2 adjacency in software, to put it in simple words stretching the network to allow VM too utilize the same subnets in multiple sites.

The blog post here showcases the ability to stretch an Oracle RAC solution in an Extended Oracle RAC deployment between multi-datacenter and using VMware NSX for L2 Adjacency.

This topic and the related demo also featured in VMworld 2016.

VIRT7575 – Architecting NSX with Business Critical Applications for Security, Automation and Business Continuity

 

 

 

Oracle Real Application Cluster using Jumbo Frames on VMware NSX

 

The standard Maximum Transmission Unit (MTU) for IP frames is 1500 Bytes. Jumbo Frames are MTU’s larger than 1500 Bytes , we usually refer to a frame with an MTU of 9000 Bytes.

Jumbo Frames can be implemented for private Cluster Interconnects and requires very careful configuration and testing to realize its benefits.

In many cases, failures or inconsistencies can occur due to incorrect setup, bugs in the driver or switch software, which can result in sub-optimal performance and network errors.

In order to make Jumbo Frames work properly for a Cluster Interconnect network, the host’s private network adapter must be configured with a persistent MTU size of ( 9000 bytes – 50 bytes of VXLAN overhead – 28 bytes of ICMP/ping ) = 8922 Bytes.

 

 

Setting MTU to 9000 Bytes to enable Jumbo Frames on VMware SDDC

 

Jumbo frames let ESXi hosts send larger frames out onto the physical network. The network must support jumbo frames end-to-end that includes physical network adapters, physical switches, and storage devices. Before enabling jumbo frames, check with your hardware vendor to ensure that your physical network adapter supports jumbo frames.

You can enable jumbo frames on a vSphere distributed switch or vSphere standard switch by changing the maximum transmission unit (MTU) to a value greater than 1500 bytes. 9000 bytes is the maximum frame size that you can configure.

More details on Jumbo Frames on VMware vSphere can be found here.

Refer to the blog ‘What’s the Big Deal with Jumbo Frames?’ about Jumbo Frames and VMware SDDC.

For on-premises setup, using VMware Web Client , Edit Settings on the distributed switch to set the MTU size.

 

 

Distributed switch with MTU set to 9000 bytes

 

 

 

On VMware Cloud on AWS, since it’s a managed service , customer will not have to set MTU to 9000 bytes as it is set when the SDDC cluster is provisioned in the first place.

 

 

 

 In our lab setup, the standard i3 ESXi servers had a 25GB Elastic Network Adapter (PF) attached .

 

 

 

  

 

Oracle Metalink on changing RAC private network MTU size

 

Refer to Oracle Metalink document ‘Recommendation for the Real Application Cluster Interconnect and Jumbo Frames (Doc ID 341788.1)’ for more information on Jumbo frames for Oracle RAC interconnect.

Refer to Metalink document ‘How to Modify Private Network Information in Oracle Clusterware (Doc ID 283684.1)’ describes how to change private network MTU only.

For example, private network MTU is changed from 1500 to 8922 [9000 – 50 bytes of VXLAN overhead – 28 bytes of ICMP/ping = 8922 Bytes ] , network interface name and subnet remain the same.

  1. Shutdown Oracle Clusterware stack on all nodes
  2. Make the required network change of MTU size at OS network layer, ensure private network is available with the desired MTU size, in this case 8922 bytes , ping with the desired MTU size works on all cluster nodes
  3. Restart Oracle Clusterware stack on all nodes

 

 

 

Conclusion

 

VMware NSX Data Center is the network virtualization and security platform that enables the virtual cloud network, a software-defined approach to networking that extends across data centers, clouds, and application frameworks. With NSX Data Center, networking and security are brought closer to the application wherever it’s running, from virtual machines (VMs) to containers to bare metal

In addition to the public network, Oracle RAC requires private network connectivity used exclusively for communication between the nodes and database instances running on those nodes. This network is commonly referred to as the interconnect.

In case of Oracle Real Application Cluster using VMware NSX and Jumbo Frames, Jumbo Frames can be implemented for private Cluster Interconnects and requires very careful configuration and testing to realize its benefits.

In order to make Jumbo Frames work properly for a Cluster Interconnect network, the host’s private network adapter must be configured with a persistent MTU size of  ( 9000 – 50 bytes of VXLAN overhead – 28 bytes of ICMP/ping ) = 8922 Bytes

All Oracle on vSphere white papers including Oracle on VMware vSphere / VMware vSAN /  VMware Cloud on AWS , Best practices, Deployment guides, Workload characterization guide can be found at  Oracle on VMware Collateral – One Stop Shop

 

Advertisements
Posted in Uncategorized

Oracle and vSphere Persistent Memory (PMEM) – Oracle Instance Recovery – An Investigation

Introduction to VMware Persistent Memory (PMEM)

 

Persistent Memory (PMEM) resides between DRAM and disk storage in the data storage hierarchy. This  technology enables byte-addressable updates and does not lose data if power is lost.

Instead of having nonvolatile storage at the bottom with the largest capacity but the slowest performance, nonvolatile storage is now very close to DRAM in terms of performance.

PMEM is a byte-addressable form of computer memory that has the following characteristics:

  • DRAM-like latency and bandwidth
  • Regular load/store CPU instructions
  • Paged/mapped by operating system just like DRAM
  • Data is persistent across reboots

More information about Persistent Memory (PMEM)  and how vSphere 6.7 can take advantage of PMEM technology to accelerate IO-intensive Oracle workloads can be found here.

The Accelerating Oracle Performance using vSphere Persistent Memory (PMEM) paper examines the performance of Oracle databases using VMware vSphere 6.7 Persistent Memory feature in different modes for below uses cases for

  • Improved performance of Oracle Redo Log using vPMEM Disk-backed vmdks/vPMEM disks in DAX mode
  • Accelerating Performance using Oracle Smart Flash Cache
  • Potential reduction in Oracle Licensing

In the blog article Oracle and vSphere Persistent Memory (PMEM) – vPMEM v/s vPMEMDisk ,  we demonstrate the performance improvement in Redo log activity when redo log files are placed on vPMEM Disk-backed vmdks/vPMEM disks in DAX mode over redo logs on vPMEMDisk backed vmdks.

 

 

Accelerating Oracle Instance / Crash Recovery using PMEM – An Investigation

 

This blog addresses the instance or crash recovery aspect of Oracle database and investigates if Persistent Memory can help with speeding up the Oracle crash recovery process.

 

 

‘It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.’ –  Sherlock Holmes Quote (A Scandal in Bohemia).

 

 

Tryst with Redis (Re-architecting an Application to Leverage PMEM)

 

To illustrate the benefits of byte-addressable persistent memory, we took a popular open source application Redis and modified it to take full advantage of persistent memory.

Redis, by default, only commits key-value transactions to memory. Persistence across reboot (e.g., due to power loss) is provided by periodically writing transaction logs to storage. By default, transaction log is written to storage every second. Therefore, Redis can lose a second’s worth of transactions in the default configuration.

Initially, we experimented by storing Redis transaction logs in persistent memory. That provided noticeable but marginal (around 12%) improvement in performance. Then, we decided to modify Redis to take full advantage of byte-addressable persistent memory by placing the entire key-value database in persistent memory. We call the modified version PMem-aware Redis. We added very low-overhead journaling to make Redis transactions on persistent memory crash consistent. With these changes, we got around 45% improvement in transaction throughput.

In addition, PMem-aware Redis can recover from crash almost instantaneously because the database is already in persistent memory and need to be read from disk. For comparison, unmodified Redis using NVMe SSD can take minutes to recover after crash, even for modest (around 1 GB) database sizes. Finally, since persistent memory is local to a host, we experimented with replicating Redis transactions to a standby host. Even with replication, PMem-aware Redis performed around 28% better than unmodified Redis using a fast and host-local NVMe SSD without any replication.

 

 

More details on this can be found in the article ‘Persistent Memory Initiative’ and white paper ‘Persistent Memory Performance on vSphere 6.7 Performance Study’ .

 

 

Oracle VM setup

 

As mentioned above, this blog will investigate if Persistent Memory can help with speeding up the crash recovery process of an Oracle database.

The Oracle VM  ‘Oracle122-RHEL-PMEM-udev’ has 6 vCPUS with vRAM 32GB.

 

 

 

The 4 hard disks attached to the VM are as below –

  • Hard Disk 1 – 60G – OS
  • Hard Disk 2 – 100G – Oracle + Grid Infrastructure binaries
  • Hard Disk 3 – 2TB – Oracle Database datafiles
  • Hard Disk 4 – 32GB – Oracle Redo logs

 

Below is the operating system view of the vmdk’s –

[root@oracle122-rhel ~]# fdisk -lu

Disk /dev/sdb: 107.4 GB, 107374182400 bytes, 209715200 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x7783b7a1

Device Boot Start End Blocks Id System
/dev/sdb1 2048 209715199 104856576 83 Linux

Disk /dev/sdc: 2199.0 GB, 2199023255552 bytes, 4294967296 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x7d28e830

Device Boot Start End Blocks Id System
/dev/sdc1 2048 4294967294 2147482623+ 83 Linux

Disk /dev/sdd: 34.4 GB, 34359738368 bytes, 67108864 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x952fca2a

Device Boot Start End Blocks Id System
/dev/sdd1 2048 67108863 33553408 83 Linux

Disk /dev/sda: 64.4 GB, 64424509440 bytes, 125829120 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x0004908d

Device Boot Start End Blocks Id System
/dev/sda1 * 2048 2099199 1048576 83 Linux
/dev/sda2 2099200 125829119 61864960 8e Linux LVM

….
[root@oracle122-rhel ~]#

 

 

 

Components and Granules in the SGA

 

All SGA components allocate and deallocate space in units of granules. Oracle Database tracks SGA memory use in internal numbers of granules for each SGA component.

The granule size is determined by the amount of SGA memory requested when the instance starts. Specifically, the granule size is based on the value of the SGA_MAX_SIZE initialization parameter.

The table below shows the SGA granule size.

 

https://docs.oracle.com/database/121/ADMIN/memory.htm#ADMIN11203

 

 

 

Oracle Memory Management

 

The memory structures that must be managed are the system global area (SGA) and the instance program global area (instance PGA).

The 2 main memory management methods are :

  • Automatic Memory Management (AMM)
    • manage the SGA memory and instance PGA memory completely automatically. You designate only the total memory size to be used by the instance, and Oracle Database dynamically exchanges memory between the SGA and the instance PGA as needed to meet processing demands. This capability is referred to as automatic memory management. With this memory management method, the database also dynamically tunes the sizes of the individual SGA components and the sizes of the individual PGAs

 

  • Manual Memory Management – These methods are
    • Automatic shared memory management – for the SGA
    • Manual shared memory management – for the SGA
    • Automatic PGA memory management – for the instance PGA
    • Manual PGA memory management – for the instance PGA

 

To use the MEMORY_TARGET or MEMORY_MAX_TARGET feature, the  /dev/shm mount point should be equal in size or larger than the value of MEMORY_TARGET or MEMORY_MAX_TARGET, whichever is larger.

For Automatic Memory Management (AMM) , the initialization parameters memory_target and memory_max_size are  set to 16G.

Contents of the /etc/fstab which shows the /dev/shm mount point –

oracle@oracle122-rhel:DBPROD:/home/oracle> cat /etc/fstab
#
# /etc/fstab
# Created by anaconda on Fri Jan 26 19:27:50 2018
#
# Accessible filesystems, by reference, are maintained under ‘/dev/disk’
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/rhel-root / xfs defaults 0 0
UUID=d200eecc-1db9-4c9f-8ae4-8b95f20d70c9 /boot xfs defaults 0 0
/dev/mapper/rhel-home /home xfs defaults 0 0
/dev/mapper/rhel-swap swap swap defaults 0 0
/dev/vg2_oracle/LogVol_u01 /u01 ext4 defaults 1 2
#/dev/pmem1 /redolog ext4 dax,defaults 1 2
#/dev/pmem2 /redolog_dax ext4 dax,defaults 1 2

shmfs /dev/shm tmpfs size=24g 0
oracle@oracle122-rhel:DBPROD:/home/oracle>

 

 

 

Test Scenario and Steps

 

We have 2 Test scenarios –

  • Oracle Crash Recovery using AMM Memory Management without Persistent Memory
  • Oracle Crash Recovery using AMM Memory Management with Persistent Memory

 

The Test steps are to  –

  • Run a SLOB workload , which is an Oracle I/O workload generation tool kit, against the database for 5 minutes
  • Within 3 minutes into the load, shutdown the database via a ‘shutdown abort’ command, mimicking an abnormal database failure or a database crash
  • Restarting the database will result in a crash recovery of the database thread and the recovery time would be recorded for each of the above test scenarios
  • Compare the recovery time for each of the above scenario

 

 

Oracle Crash Recovery using AMM Memory Management without Persistent Memory

 

Output of /proc/meminfo to show the VM memory total :

oracle@oracle122-rhel:DBPROD:/home/oracle> cat /proc/meminfo
MemTotal: 32781052 kB
MemFree: 19212424 kB
MemAvailable: 20514972 kB
Buffers: 108464 kB
Cached: 12110000 kB
SwapCached: 0 kB
Active: 11939808 kB
Inactive: 1126292 kB
Active(anon): 11539540 kB
Inactive(anon): 65204 kB
Active(file): 400268 kB
Inactive(file): 1061088 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 6291452 kB
SwapFree: 6291452 kB
Dirty: 100 kB
Writeback: 0 kB
AnonPages: 847652 kB
Mapped: 1327056 kB
Shmem: 10757112 kB
Slab: 123280 kB
SReclaimable: 65332 kB
SUnreclaim: 57948 kB
KernelStack: 8000 kB
PageTables: 94676 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 22681976 kB
Committed_AS: 12997260 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 33760788 kB
VmallocChunk: 34325131260 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 173952 kB
DirectMap2M: 5068800 kB
DirectMap1G: 30408704 kB
oracle@oracle122-rhel:DBPROD:/home/oracle>

 

output of df command to show /dev/shm filesystem size

oracle@oracle122-rhel:DBPROD:/home/oracle> df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/rhel-root 36G 2.3G 34G 7% /
devtmpfs 16G 0 16G 0% /dev
tmpfs 24G 11G 14G 43% /dev/shm
tmpfs 16G 9.0M 16G 1% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/sda1 1014M 223M 792M 22% /boot
/dev/mapper/rhel-home 18G 4.0G 14G 23% /home
/dev/mapper/vg2_oracle-LogVol_u01 99G 31G 64G 33% /u01
tmpfs 3.2G 0 3.2G 0% /run/user/501
tmpfs 3.2G 0 3.2G 0% /run/user/0
oracle@oracle122-rhel:DBPROD:/home/oracle>

 

List the contents of the /dev/shm file system. As we can see from the above Granules table, the SGA & PGA for database was set to 16GB and hence the SGA granule size is 32MB.

The ASM instance memory footprint is close to 32MB and hence the granule size is MB for the ASM instance.

oracle@oracle122-rhel:DBPROD:/home/oracle> ls -l /dev/shm
total 10747904
-rw——- 1 grid oinstall 4194304 Nov 23 08:39 ora_+ASM_32769_0
-rw——- 1 grid oinstall 4194304 Nov 22 18:52 ora_+ASM_32769_1
-rw——- 1 grid oinstall 4194304 Nov 22 18:52 ora_+ASM_32769_2
…..
-rw——- 1 grid oinstall 4194304 Nov 23 08:39 ora_+ASM_65536_114
-rw——- 1 grid oinstall 4194304 Nov 22 18:52 ora_+ASM_65536_115
….
-rw——- 1 grid oinstall 0 Nov 22 18:52 ora_+ASM_65536_157
-rw——- 1 grid oinstall 0 Nov 22 18:52 ora_+ASM_65536_158
-rw——- 1 grid oinstall 0 Nov 22 18:52 ora_+ASM_65536_159
-rw——- 1 grid oinstall 4194304 Nov 22 18:52 ora_+ASM_65536_16
-rw——- 1 grid oinstall 0 Nov 22 18:52 ora_+ASM_65536_160
-rw——- 1 grid oinstall 0 Nov 22 18:52 ora_+ASM_65536_161
-rw——- 1 grid oinstall 0 Nov 22 18:52 ora_+ASM_65536_162

-rw——- 1 oracle asmadmin 33554432 Nov 23 08:33 ora_DBPROD_294916_50
-rw——- 1 oracle asmadmin 0 Nov 23 08:32 ora_DBPROD_294916_500
-rw——- 1 oracle asmadmin 0 Nov 23 08:32 ora_DBPROD_294916_501
-rw——- 1 oracle asmadmin 0 Nov 23 08:32 ora_DBPROD_294916_502
-rw——- 1 oracle asmadmin 0 Nov 23 08:32 ora_DBPROD_294916_503
…..
-rw——- 1 oracle asmadmin 33554432 Nov 23 08:33 ora_DBPROD_294916_56
….
-rw——- 1 oracle asmadmin 33554432 Nov 23 08:33 ora_DBPROD_294916_59
oracle@oracle122-rhel:DBPROD:/home/oracle>

 

Start the SLOB workload against the Oracle database using command ‘/u01/software/SLOB/SLOB/runit.sh -s 1 -t 100

 

oracle@oracle122-rhel:DBPROD:/home/oracle> /u01/software/SLOB/SLOB/runit.sh -s 1 -t 100
Before AWR
SQL*Plus: Release 12.2.0.1.0 Production on Fri Nov 23 08:49:06 2018
Copyright (c) 1982, 2016, Oracle. All rights reserved.
SQL> Connected.
SQL>
PL/SQL procedure successfully completed.
SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 – 64bit Production
SLOB started with 1 users
/u01/software/SLOB/SLOB/runit.sh: line 2: !/bin/bash: No such file or directory
NOTIFY : 2018.11.23-08:49:08 : For security purposes all file and directory creation and deletions
NOTIFY : 2018.11.23-08:49:08 : performed by /u01/software/SLOB/SLOB/runit.sh are logged in: /u01/software/SLOB/SLOB/.file_operations_audit_trail.out.
NOTIFY : 2018.11.23-08:49:08 : SLOB TEMPDIR is /tmp/.SLOB.2018.11.23.084908. SLOB will delete this directory at the end of this execution.
NOTIFY : 2018.11.23-08:49:08 : Sourcing in slob.conf
NOTIFY : 2018.11.23-08:49:08 : Performing initial slob.conf sanity check…
NOTIFY : 2018.11.23-08:49:08 :
NOTIFY : 2018.11.23-08:49:08 : All SLOB sessions will connect to DBPROD_PMEM_RHEL_PDB1 via SQL*Net.
NOTIFY : 2018.11.23-08:49:08 : Connecting to the instance to validate slob.conf->SCALE setting.

UPDATE_PCT: 100
SCAN_PCT: 0
RUN_TIME: 300
WORK_LOOP: 0
SCALE: 28G (3670016 blocks)
WORK_UNIT: 64
REDO_STRESS: HEAVY
HOT_SCHEMA_FREQUENCY: 0
HOTSPOT_MB: 8
HOTSPOT_OFFSET_MB: 16
HOTSPOT_FREQUENCY: 3
THINK_TM_FREQUENCY: 0
THINK_TM_MIN: .1
THINK_TM_MAX: .5
DATABASE_STATISTICS_TYPE: awr
SYSDBA_PASSWD: “vmware123”
DBA_PRIV_USER: “sys”
ADMIN_SQLNET_SERVICE: “DBPROD_PMEM_RHEL_PDB1”
SQLNET_SERVICE_BASE: “DBPROD_PMEM_RHEL_PDB1”
SQLNET_SERVICE_MAX: “”

EXTERNAL_SCRIPT: “”
THREADS_PER_SCHEMA: 100 (-t option)

Note: runit.sh will use the following connect strings as per slob.conf settings:
Admin Connect String: “sys/vmware123@DBPROD_PMEM_RHEL_PDB1 as sysdba”

NOTIFY : 2018.11.23-08:49:09 : Clearing temporary SLOB output files from previous SLOB testing.
NOTIFY : 2018.11.23-08:49:09 : Testing admin connectivity to the instance to validate slob.conf settings.
NOTIFY : 2018.11.23-08:49:09 : Testing connectivity. Command: “sqlplus -L sys/vmware123@DBPROD_PMEM_RHEL_PDB1 as sysdba”.
NOTIFY : 2018.11.23-08:49:09 : Next, testing 1 user (non-admin) connections…
NOTIFY : 2018.11.23-08:49:09 : Testing connectivity. Command: “sqlplus -L user1/user1@DBPROD_PMEM_RHEL_PDB1”.
NOTIFY : 2018.11.23-08:49:09 : Performing redo log switch.
NOTIFY : 2018.11.23-08:49:09 : Redo log switch complete. Setting up trigger mechanism.
NOTIFY : 2018.11.23-08:49:19 : Running iostat, vmstat and mpstat on current host–in background.
NOTIFY : 2018.11.23-08:49:19 : Connecting 100 (THREADS_PER_SCHEMA) session(s) to 1 schema(s) …
NOTIFY : 2018.11.23-08:49:22 :
NOTIFY : 2018.11.23-08:49:22 : Executing awr “before snap” procedure. Command: “sqlplus -S -L sys/vmware123@DBPROD_PMEM_RHEL_PDB1 as sysdba”.
NOTIFY : 2018.11.23-08:49:23 : Before awr snap ID is 1218
NOTIFY : 2018.11.23-08:49:23 :
NOTIFY : 2018.11.23-08:49:23 : Test has been triggered. Processes are executing.
NOTIFY : 2018.11.23-08:49:23 : List of monitored sqlplus PIDs written to /tmp/.SLOB.2018.11.23.084908/8583.f_wait_pids.out.
NOTIFY : 2018.11.23-08:49:34 : Waiting for 287 seconds before monitoring running processes (for exit).
^C
oracle@oracle122-rhel:DBPROD:/home/oracle>

 

Run ‘shutdown abort’ against database to simulate a crash at 3 minutes into the load.

 

oracle@oracle122-rhel:DBPROD:/home/oracle> date
Fri Nov 23 08:52:21 PST 2018

oracle@oracle122-rhel:DBPROD:/home/oracle> ./stop_db_abort
SQL*Plus: Release 12.2.0.1.0 Production on Fri Nov 23 08:52:27 2018
Copyright (c) 1982, 2016, Oracle. All rights reserved.
SQL> Connected.
SQL> ORACLE instance shut down.
SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 – 64bit Production
oracle@oracle122-rhel:DBPROD:/home/oracle>

 

Restart the Oracle database which prompts Oracle to perform crash recovery of the thread.

 

Observe the database alert log file to see the crash recovery progress.

 

oracle@oracle122-rhel:DBPROD:/u01/admin/DBPROD/diag/rdbms/dbprod/DBPROD/trace> tail -f alert_DBPROD.log
2018-11-23T08:52:27.264745-08:00
Shutting down instance (abort) (OS id: 994)
License high water mark = 105
2018-11-23T08:52:27.265146-08:00
USER (ospid: 994): terminating the instance
2018-11-23T08:52:28.276572-08:00
Instance terminated by USER, pid = 994
2018-11-23T08:52:29.094544-08:00
Instance shutdown complete (OS id: 994)

2018-11-23T08:56:01.276626-08:00
Completed redo scan
read 3122163 KB redo, 683839 data blocks need recovery
2018-11-23T08:56:06.017712-08:00

2018-11-23T08:56:08.955533-08:00
Started redo application at
Thread 1: logseq 10505, block 48777, offset 0
2018-11-23T08:56:08.960895-08:00
Recovery of Online Redo Log: Thread 1 Group 4 Seq 10505 Reading mem 0
Mem# 0: +REDO_DG/DBPROD/group04_redo01.log
Mem# 1: +REDO_DG/DBPROD/group04_redo02.log
2018-11-23T08:56:09.522910-08:00
Recovery of Online Redo Log: Thread 1 Group 5 Seq 10506 Reading mem 0
Mem# 0: +REDO_DG/DBPROD/group05_redo01.log
Mem# 1: +REDO_DG/DBPROD/group05_redo02.log
2018-11-23T08:56:10.128741-08:00
Recovery of Online Redo Log: Thread 1 Group 6 Seq 10507 Reading mem 0
Mem# 0: +REDO_DG/DBPROD/group06_redo01.log
Mem# 1: +REDO_DG/DBPROD/group06_redo02.log
2018-11-23T08:56:10.736250-08:00
Recovery of Online Redo Log: Thread 1 Group 7 Seq 10508 Reading mem 0
Mem# 0: +REDO_DG/DBPROD/group07_redo01.log
Mem# 1: +REDO_DG/DBPROD/group07_redo02.log
2018-11-23T08:56:11.354438-08:00
Recovery of Online Redo Log: Thread 1 Group 8 Seq 10509 Reading mem 0
Mem# 0: +REDO_DG/DBPROD/group08_redo01.log
Mem# 1: +REDO_DG/DBPROD/group08_redo02.log
2018-11-23T08:56:11.963102-08:00
Recovery of Online Redo Log: Thread 1 Group 9 Seq 10510 Reading mem 0
Mem# 0: +REDO_DG/DBPROD/group09_redo01.log
Mem# 1: +REDO_DG/DBPROD/group09_redo02.log
2018-11-23T08:56:12.575632-08:00
Recovery of Online Redo Log: Thread 1 Group 10 Seq 10511 Reading mem 0
Mem# 0: +REDO_DG/DBPROD/group10_redo01.log
Mem# 1: +REDO_DG/DBPROD/group10_redo02.log
2018-11-23T08:56:13.196423-08:00
Recovery of Online Redo Log: Thread 1 Group 11 Seq 10512 Reading mem 0
Mem# 0: +REDO_DG/DBPROD/group11_redo01.log
Mem# 1: +REDO_DG/DBPROD/group11_redo02.log
2018-11-23T08:56:13.855567-08:00
Recovery of Online Redo Log: Thread 1 Group 12 Seq 10513 Reading mem 0
Mem# 0: +REDO_DG/DBPROD/group12_redo01.log
Mem# 1: +REDO_DG/DBPROD/group12_redo02.log
2018-11-23T08:56:14.533012-08:00
Recovery of Online Redo Log: Thread 1 Group 13 Seq 10514 Reading mem 0
Mem# 0: +REDO_DG/DBPROD/group13_redo01.log
Mem# 1: +REDO_DG/DBPROD/group13_redo02.log
2018-11-23T08:56:15.210285-08:00
Recovery of Online Redo Log: Thread 1 Group 14 Seq 10515 Reading mem 0
Mem# 0: +REDO_DG/DBPROD/group14_redo01.log
Mem# 1: +REDO_DG/DBPROD/group14_redo02.log
2018-11-23T08:56:15.899655-08:00
Recovery of Online Redo Log: Thread 1 Group 15 Seq 10516 Reading mem 0
Mem# 0: +REDO_DG/DBPROD/group15_redo01.log
Mem# 1: +REDO_DG/DBPROD/group15_redo02.log
2018-11-23T08:56:16.605822-08:00
Recovery of Online Redo Log: Thread 1 Group 16 Seq 10517 Reading mem 0
Mem# 0: +REDO_DG/DBPROD/group16_redo01.log
Mem# 1: +REDO_DG/DBPROD/group16_redo02.log
2018-11-23T08:56:17.319616-08:00
Completed redo application of 377.92MB
2018-11-23T08:56:25.061718-08:00
Completed crash recovery at
Thread 1: RBA 10517.323909.16, nab 323909, scn 0x00000000087264b7
683839 data blocks read, 679844 data blocks written, 3122163 redo k-bytes read
Endian type of dictionary set to little

 

Start of recovery process – 2018-11-23T08:56:08.955533-08:00
Stop of recovery process – 2018-11-23T08:56:25.061718-08:00
Time taken for crash recovery ~ 17 seconds

 

 

 

Oracle Crash Recovery using AMM Memory Management with Persistent Memory

 

Stop Oracle database and ASM instance . Add a NVDIMM device to the VM of size 32GB as shown below.

 

 

Running linux ‘fdisk’ command shows the nvdimm device as below.

 

[root@oracle122-rhel ~]# fdisk -lu
….

Disk /dev/pmem0: 34.4 GB, 34359738368 bytes, 67108864 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


[root@oracle122-rhel ~]#

 

Mount the ‘/dev/shm’ file system on the PMEM device as shown below

 

[root@oracle122-rhel ~]# umount /dev/shm ; mount -t tmpfs -o size=24g /dev/pmem0 /dev/shm ; df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/rhel-root 36G 2.3G 34G 7% /
devtmpfs 16G 0 16G 0% /dev
tmpfs 16G 9.1M 16G 1% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/sda1 1014M 223M 792M 22% /boot
/dev/mapper/rhel-home 18G 4.0G 14G 23% /home
/dev/mapper/vg2_oracle-LogVol_u01 99G 30G 65G 32% /u01
tmpfs 3.2G 0 3.2G 0% /run/user/501
tmpfs 3.2G 0 3.2G 0% /run/user/0
/dev/pmem0 24G 0 24G 0% /dev/shm
[root@oracle122-rhel ~]#

 

Restart the Oracle ASM and Database instance.

 

Rerun the same SLOB workload against the oracle database.

 

oracle@oracle122-rhel:DBPROD:/home/oracle> /u01/software/SLOB/SLOB/runit.sh -s 1 -t 100
Before AWR
SQL*Plus: Release 12.2.0.1.0 Production on Fri Nov 23 09:16:56 2018
Copyright (c) 1982, 2016, Oracle. All rights reserved.
SQL> Connected.
SQL>
PL/SQL procedure successfully completed.

SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 – 64bit Production

SLOB started with 1 users

/u01/software/SLOB/SLOB/runit.sh: line 2: !/bin/bash: No such file or directory
NOTIFY : 2018.11.23-09:16:58 : For security purposes all file and directory creation and deletions
NOTIFY : 2018.11.23-09:16:58 : performed by /u01/software/SLOB/SLOB/runit.sh are logged in: /u01/software/SLOB/SLOB/.file_operations_audit_trail.out.
NOTIFY : 2018.11.23-09:16:58 : SLOB TEMPDIR is /tmp/.SLOB.2018.11.23.091658. SLOB will delete this directory at the end of this execution.
NOTIFY : 2018.11.23-09:16:58 : Sourcing in slob.conf
NOTIFY : 2018.11.23-09:16:58 : Performing initial slob.conf sanity check…
NOTIFY : 2018.11.23-09:16:58 :
NOTIFY : 2018.11.23-09:16:58 : All SLOB sessions will connect to DBPROD_PMEM_RHEL_PDB1 via SQL*Net.
NOTIFY : 2018.11.23-09:16:58 : Connecting to the instance to validate slob.conf->SCALE setting.

UPDATE_PCT: 100
SCAN_PCT: 0
RUN_TIME: 300
WORK_LOOP: 0
SCALE: 28G (3670016 blocks)
WORK_UNIT: 64
REDO_STRESS: HEAVY
HOT_SCHEMA_FREQUENCY: 0
HOTSPOT_MB: 8
HOTSPOT_OFFSET_MB: 16
HOTSPOT_FREQUENCY: 3
THINK_TM_FREQUENCY: 0
THINK_TM_MIN: .1
THINK_TM_MAX: .5
DATABASE_STATISTICS_TYPE: awr
SYSDBA_PASSWD: “vmware123”
DBA_PRIV_USER: “sys”
ADMIN_SQLNET_SERVICE: “DBPROD_PMEM_RHEL_PDB1”
SQLNET_SERVICE_BASE: “DBPROD_PMEM_RHEL_PDB1”
SQLNET_SERVICE_MAX: “”

EXTERNAL_SCRIPT: “”
THREADS_PER_SCHEMA: 100 (-t option)

Note: runit.sh will use the following connect strings as per slob.conf settings:
Admin Connect String: “sys/vmware123@DBPROD_PMEM_RHEL_PDB1 as sysdba”

NOTIFY : 2018.11.23-09:16:59 :
NOTIFY : 2018.11.23-09:16:59 :
WARNING : 2018.11.23-09:16:59 : *****************************************************************************
WARNING : 2018.11.23-09:16:59 : SLOB has found possible zombie processes from a prior SLOB test.
WARNING : 2018.11.23-09:16:59 : It is possible that a prior SLOB test aborted.
WARNING : 2018.11.23-09:16:59 : Please investigate the following processes:
WARNING : 2018.11.23-09:16:59 : *****************************************************************************
UID PID PPID C STIME TTY STAT TIME CMD
oracle 30594 1 0 08:49 pts/0 S 0:00 iostat -xm 3
oracle 30595 1 0 08:49 pts/0 S 0:00 vmstat 3
WARNING : 2018.11.23-09:16:59 : *****************************************************************************
NOTIFY : 2018.11.23-09:16:59 : Checking for unlinked output files for processes: 30594 30595
NOTIFY : 2018.11.23-09:16:59 : Unlinked files for process pid 30594 (ls -l /proc/30594/fd):
NOTIFY : 2018.11.23-09:16:59 : Unlinked files for process pid 30595 (ls -l /proc/30595/fd):
WARNING : 2018.11.23-09:16:59 : *****************************************************************************
NOTIFY : 2018.11.23-09:16:59 : Clearing temporary SLOB output files from previous SLOB testing.
NOTIFY : 2018.11.23-09:16:59 : Testing admin connectivity to the instance to validate slob.conf settings.
NOTIFY : 2018.11.23-09:16:59 : Testing connectivity. Command: “sqlplus -L sys/vmware123@DBPROD_PMEM_RHEL_PDB1 as sysdba”.
NOTIFY : 2018.11.23-09:16:59 : Next, testing 1 user (non-admin) connections…
NOTIFY : 2018.11.23-09:16:59 : Testing connectivity. Command: “sqlplus -L user1/user1@DBPROD_PMEM_RHEL_PDB1”.
NOTIFY : 2018.11.23-09:16:59 : Performing redo log switch.
NOTIFY : 2018.11.23-09:16:59 : Redo log switch complete. Setting up trigger mechanism.
NOTIFY : 2018.11.23-09:17:09 : Running iostat, vmstat and mpstat on current host–in background.
NOTIFY : 2018.11.23-09:17:09 : Connecting 100 (THREADS_PER_SCHEMA) session(s) to 1 schema(s) …
NOTIFY : 2018.11.23-09:17:12 :
NOTIFY : 2018.11.23-09:17:12 : Executing awr “before snap” procedure. Command: “sqlplus -S -L sys/vmware123@DBPROD_PMEM_RHEL_PDB1 as sysdba”.
NOTIFY : 2018.11.23-09:17:13 : Before awr snap ID is 1219
NOTIFY : 2018.11.23-09:17:13 :
NOTIFY : 2018.11.23-09:17:13 : Test has been triggered. Processes are executing.
NOTIFY : 2018.11.23-09:17:13 : List of monitored sqlplus PIDs written to /tmp/.SLOB.2018.11.23.091658/17594.f_wait_pids.out.
NOTIFY : 2018.11.23-09:17:23 : Waiting for 287 seconds before monitoring running processes (for exit).
^C
oracle@oracle122-rhel:DBPROD:/home/oracle>

 

Run ‘shutdown abort’ against database to simulate a crash at 3 minutes into the load.

 

oracle@oracle122-rhel:DBPROD:/home/oracle> date
Fri Nov 23 09:20:23 PST 2018

oracle@oracle122-rhel:DBPROD:/home/oracle> ./stop_db_abort
SQL*Plus: Release 12.2.0.1.0 Production on Fri Nov 23 09:20:28 2018
Copyright (c) 1982, 2016, Oracle. All rights reserved.
SQL> Connected.
SQL> ORACLE instance shut down.
SQL> Disconnected from Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 – 64bit Production
oracle@oracle122-rhel:DBPROD:/home/oracle>

 

Restart the Oracle database which prompts Oracle to perform crash recovery of the thread.

 

Observe the database alert log file to see the crash recovery progress.

 

oracle@oracle122-rhel:DBPROD:/u01/admin/DBPROD/diag/rdbms/dbprod/DBPROD/trace> tail -f alert_DBPROD.log
….
2018-11-23T09:20:28.524404-08:00
Shutting down instance (abort) (OS id: 18455)
License high water mark = 105
2018-11-23T09:20:28.524825-08:00
USER (ospid: 18455): terminating the instance
2018-11-23T09:20:29.724996-08:00
Instance terminated by USER, pid = 18455
2018-11-23T09:20:30.532675-08:00
Instance shutdown complete (OS id: 18455)
…..
2018-11-23T09:22:47.704602-08:00
Completed redo scan
read 3233975 KB redo, 725845 data blocks need recovery
2018-11-23T09:22:51.953683-08:00

2018-11-23T09:22:56.044295-08:00
Started redo application at
Thread 1: logseq 10518, block 197286, offset 0
2018-11-23T09:22:56.050171-08:00
Recovery of Online Redo Log: Thread 1 Group 1 Seq 10518 Reading mem 0
Mem# 0: +REDO_DG/DBPROD/group01_redo01.log
Mem# 1: +REDO_DG/DBPROD/group01_redo02.log
2018-11-23T09:22:56.430511-08:00
Recovery of Online Redo Log: Thread 1 Group 2 Seq 10519 Reading mem 0
Mem# 0: +REDO_DG/DBPROD/group02_redo01.log
Mem# 1: +REDO_DG/DBPROD/group02_redo02.log
2018-11-23T09:22:57.038002-08:00
Recovery of Online Redo Log: Thread 1 Group 3 Seq 10520 Reading mem 0
Mem# 0: +REDO_DG/DBPROD/group03_redo01.log
Mem# 1: +REDO_DG/DBPROD/group03_redo02.log
2018-11-23T09:22:57.661509-08:00
Recovery of Online Redo Log: Thread 1 Group 4 Seq 10521 Reading mem 0
Mem# 0: +REDO_DG/DBPROD/group04_redo01.log
Mem# 1: +REDO_DG/DBPROD/group04_redo02.log
2018-11-23T09:22:58.285863-08:00
Recovery of Online Redo Log: Thread 1 Group 5 Seq 10522 Reading mem 0
Mem# 0: +REDO_DG/DBPROD/group05_redo01.log
Mem# 1: +REDO_DG/DBPROD/group05_redo02.log
2018-11-23T09:22:58.899482-08:00
Recovery of Online Redo Log: Thread 1 Group 6 Seq 10523 Reading mem 0
Mem# 0: +REDO_DG/DBPROD/group06_redo01.log
Mem# 1: +REDO_DG/DBPROD/group06_redo02.log
2018-11-23T09:22:59.525095-08:00
Recovery of Online Redo Log: Thread 1 Group 7 Seq 10524 Reading mem 0
Mem# 0: +REDO_DG/DBPROD/group07_redo01.log
Mem# 1: +REDO_DG/DBPROD/group07_redo02.log
2018-11-23T09:23:00.153196-08:00
Recovery of Online Redo Log: Thread 1 Group 8 Seq 10525 Reading mem 0
Mem# 0: +REDO_DG/DBPROD/group08_redo01.log
Mem# 1: +REDO_DG/DBPROD/group08_redo02.log
2018-11-23T09:23:00.802933-08:00
Recovery of Online Redo Log: Thread 1 Group 9 Seq 10526 Reading mem 0
Mem# 0: +REDO_DG/DBPROD/group09_redo01.log
Mem# 1: +REDO_DG/DBPROD/group09_redo02.log
2018-11-23T09:23:01.497673-08:00
Recovery of Online Redo Log: Thread 1 Group 10 Seq 10527 Reading mem 0
Mem# 0: +REDO_DG/DBPROD/group10_redo01.log
Mem# 1: +REDO_DG/DBPROD/group10_redo02.log
2018-11-23T09:23:02.152860-08:00
Recovery of Online Redo Log: Thread 1 Group 11 Seq 10528 Reading mem 0
Mem# 0: +REDO_DG/DBPROD/group11_redo01.log
Mem# 1: +REDO_DG/DBPROD/group11_redo02.log
2018-11-23T09:23:02.861497-08:00
Recovery of Online Redo Log: Thread 1 Group 12 Seq 10529 Reading mem 0
Mem# 0: +REDO_DG/DBPROD/group12_redo01.log
Mem# 1: +REDO_DG/DBPROD/group12_redo02.log
2018-11-23T09:23:03.583704-08:00
Recovery of Online Redo Log: Thread 1 Group 13 Seq 10530 Reading mem 0
Mem# 0: +REDO_DG/DBPROD/group13_redo01.log
Mem# 1: +REDO_DG/DBPROD/group13_redo02.log
2018-11-23T09:23:04.323907-08:00
Recovery of Online Redo Log: Thread 1 Group 14 Seq 10531 Reading mem 0
Mem# 0: +REDO_DG/DBPROD/group14_redo01.log
Mem# 1: +REDO_DG/DBPROD/group14_redo02.log
2018-11-23T09:23:05.024650-08:00
Completed redo application of 375.40MB
2018-11-23T09:23:13.419001-08:00
Completed crash recovery at
Thread 1: RBA 10531.255760.16, nab 255760, scn 0x000000000878ad36
725845 data blocks read, 721385 data blocks written, 3233975 redo k-bytes read
Endian type of dictionary set to little
….

 

Start of recovery process – 2018-11-23T09:22:56.044295-08:00
Stop of recovery process – 2018-11-23T09:23:13.419001-08:00
Time taken for crash recovery ~ 17 secs

 

 

Test Results

 

Oracle Crash Recovery using AMM Memory Management without Persistent Memory

Start of recovery process – 2018-11-23T08:56:08.955533-08:00
Stop of recovery process – 2018-11-23T08:56:25.061718-08:00
Time taken for crash recovery ~ 17 seconds

 

Oracle Crash Recovery using AMM Memory Management with Persistent Memory

Start of recovery process – 2018-11-23T09:22:56.044295-08:00
Stop of recovery process – 2018-11-23T09:23:13.419001-08:00
Time taken for crash recovery ~ 17 seconds

 

The above 2 results are the same and we wonder why we are not able to accelerate the Oracle crash recovery process in the same way as we were able to achieve with the Redis IMBD. Keep in mind, with PMem-aware Redis. we added very low-overhead journaling to make Redis transactions on persistent memory crash consistent .

That’s not the case with the Oracle software out of the box. Oracle will still use its proprietary crash recovery mechanism and hence the reason.

Oracle recreates the SGA granules on every instance startup , in /dev/shm , and hence even if the old SGA granules are present in PMEM , Oracle disregards it for crash recovery purposes.

The best way to find out is to trace the Oracle startup process which shows the SGA granules under /dev/shm are created with O_RDWR, O_CREAT & O_SYNC flags.

The man page for Linux open describes the flags in the open() command below.

A call to open() creates a new open file description, an entry in the system-wide table of open files. The open file description records the file offset and the file status flags (see below). A file descriptor is a reference to an open file description; this reference is unaffected if pathname is subsequently removed or modified to refer to a different file. For further details on open file descriptions, see NOTES.

The argument flags must include one of the following access modes: O_RDONLY, O_WRONLY, or O_RDWR. These request opening the file read-only, write-only, or read/write, respectively.

O_CREAT – If pathname does not exist, create it as a regular file. The owner (user ID) of the new file is set to the effective user ID of the process.

O_SYNC –    Write operations on the file will complete according to the requirements of synchronized I/O file integrity completion (by contrast with the synchronized I/O data integrity completion provided by O_DSYNC.)

 

http://man7.org/linux/man-pages/man2/open.2.html

 

The complete output of the strace command for Instance startup is below.

strace –o /tmp/foo –f sh ./start_db

 

Output :

24932 open(“/dev/shm”, O_RDONLY|O_NOCTTY) = 3
24932 fstat(3, {st_mode=S_IFDIR|S_ISVTX|0777, st_size=5480, …}) = 0
24932 close(3) = 0
24932 open(“/etc/mtab”, O_RDONLY|O_CLOEXEC) = 3
24932 fstat(3, {st_mode=S_IFREG|0444, st_size=0, …}) = 0
24932 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fba5c338000
24932 read(3, “rootfs / rootfs rw 0 0\nsysfs /sy”…, 1024) = 1024
24932 read(3, “eezer cgroup rw,nosuid,nodev,noe”…, 1024) = 1024
24932 read(3, “ogVol_u01 /u01 ext4 rw,relatime,”…, 1024) = 401
24932 read(3, “”, 1024) = 0
24932 close(3) = 0
24932 munmap(0x7fba5c338000, 4096) = 0
24932 lstat(“/dev”, {st_mode=S_IFDIR|0755, st_size=3540, …}) = 0
24932 lstat(“/dev/shm”, {st_mode=S_IFDIR|S_ISVTX|0777, st_size=5480, …}) = 0
24932 stat(“/dev/shm”, {st_mode=S_IFDIR|S_ISVTX|0777, st_size=5480, …}) = 0
24932 uname({sysname=”Linux”, nodename=”oracle122-rhel.vslab.local”, …}) = 0
24932 statfs(“/dev/shm”, {f_type=TMPFS_MAGIC, f_bsize=4096, f_blocks=6291456, f_bfree=6127616, f_bavail=6127616, f_files=4097625, f_ffree=4097352, f_fsid={0, 0}, f_namelen=255, f_frsize=409
6, f_flags=ST_VALID|ST_RELATIME}) = 0
24932 stat(“/dev/shm”, {st_mode=S_IFDIR|S_ISVTX|0777, st_size=5480, …}) = 0
24932 open(“/usr/lib64/gconv/gconv-modules.cache”, O_RDONLY) = 3
24932 fstat(3, {st_mode=S_IFREG|0644, st_size=26254, …}) = 0
24932 mmap(NULL, 26254, PROT_READ, MAP_SHARED, 3, 0) = 0x7fba5c332000
24932 close(3) = 0
24932 fstat(1, {st_mode=S_IFIFO|0600, st_size=0, …}) = 0
24932 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fba5c331000
24932 write(1, “Filesystem 1K-blocks Used “…, 114) = 114
24932 close(1 <unfinished …>
24917 <… read resumed> “Filesystem 1K-blocks Used “…, 4096) = 114
24932 <… close resumed> ) = 0
24932 munmap(0x7fba5c331000, 4096) = 0
24917 close(13 <unfinished …>
24932 close(2) = 0
24917 <… close resumed> ) = 0
24932 exit_group(0) = ?
24917 wait4(24932, <unfinished …>
24932 +++ exited with 0 +++
24917 <… wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 24932
24917 — SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=24932, si_uid=54321, si_status=0, si_utime=0, si_stime=0} —
24917 munmap(0x7fc6387df000, 4096) = 0
24917 get_mempolicy(NULL, NULL, 0, NULL, 0) = 0
24917 get_mempolicy(NULL, NULL, 0, NULL, 0) = 0
24917 shmget(IPC_PRIVATE, 4096, IPC_CREAT|IPC_EXCL|0600) = 983043
24917 shmat(983043, NULL, 0) = 0x7fc6387df000
24917 open(“/dev/shm/ora_DBPROD_983043_0”, O_RDWR|O_CREAT|O_SYNC, 0600) = 13
24917 close(13) = 0
24917 shmdt(0x7fc6387df000) = 0
24917 getrlimit(RLIMIT_STACK, {rlim_cur=32768*1024, rlim_max=32768*1024}) = 0
24917 brk(NULL) = 0x1603e000
24917 open(“/proc/self/maps”, O_RDONLY) = 13
24917 read(13, “00400000-15293000 r-xp 00000000 “…, 299) = 299
24917 read(13, ” /u01/app/o”…, 299) = 299
24917 read(13, ” /u01/app/oracle”…, 299) = 299
24917 read(13, “t/12.2.0/dbhome_1/lib/libnque12.”…, 299) = 299
24917 read(13, “377e000-7fc63377f000 r–p 0000b0″…, 299) = 299
24917 read(13, ” 33556075 /usr”…, 299) = 299
24917 read(13, “5 /usr/lib64/l”…, 299) = 299
24917 read(13, “p 00014000 fd:00 33554500 “…, 299) = 299
24917 read(13, “3d6b000-7fc633f6a000 —p 001c30″…, 299) = 299
24917 read(13, “7fc633f75000 rw-p 00000000 00:00″…, 299) = 299
24917 read(13, ” /usr/lib64/libreso”…, 299) = 299
24917 read(13, “1a4000-7fc6343a4000 —p 0001600″…, 299) = 299
24917 read(13, “6000-7fc6343a8000 rw-p 00000000 “…, 299) = 299
24917 read(13, “18 /usr/lib64/”…, 299) = 299
24917 read(13, “\n7fc6346c5000-7fc6348c4000 —p “…, 299) = 299
24917 read(13, “c6000-7fc6348c8000 r-xp 00000000″…, 299) = 299
24917 read(13, “-7fc634aca000 rw-p 00003000 fd:0″…, 299) = 299
24917 read(13, “le/product/12.2.0/dbhome_1/lib/l”…, 299) = 299
24917 read(13, “00001000 fd:00 33557989 “…, 299) = 299
24917 read(13, ” 00000000 fd:02 2885530 “…, 299) = 299
24917 read(13, ” /u01/app/oracle/product/”…, 299) = 299
24917 read(13, “0/dbhome_1/lib/libocrb12.so\n7fc6″…, 299) = 299
24917 read(13, “-7fc63578b000 —p 00109000 fd:0″…, 299) = 299
24917 read(13, ” /u01/app/ora”…, 299) = 299
24917 read(13, “duct/12.2.0/dbhome_1/lib/libskgx”…, 299) = 299
24917 read(13, “so\n7fc636c31000-7fc636c78000 rw-“…, 299) = 299
24917 read(13, “b/libdbcfg12.so\n7fc636ca1000-7fc”…, 299) = 299
24917 read(13, “d000 r-xp 00000000 fd:02 2889021″…, 299) = 299
24917 read(13, ” /u01/app/oracle/p”…, 299) = 299
24917 read(13, “2.2.0/dbhome_1/lib/libipc1.so\n7f”…, 299) = 299
24917 read(13, “duct/12.2.0/dbhome_1/lib/libmql1″…, 299) = 299
24917 read(13, “96000-7fc637798000 rw-p 00000000″…, 299) = 299
24917 read(13, ” /usr/lib64/librt-2.1″…, 299) = 299
24917 read(13, “041000 fd:02 2884686 “…, 299) = 299
24917 read(13, ” /u01/app/oracle/product/12.2.”…, 299) = 299
24917 read(13, “ome_1/lib/libskgxp12.so\n7fc637ef”…, 299) = 299
24917 read(13, “c6381bf000 rw-p 000b5000 fd:02 2″…, 299) = 299
24917 read(13, ” /u01/app/oracle/”…, 299) = 299
24917 read(13, “0/dbhome_1/lib/libodmd12.so\n7fc6″…, 299) = 299
24917 read(13, “0-7fc6385e4000 r-xp 00000000 fd:”…, 299) = 299
24917 read(13, “000 rw-p 00022000 fd:00 33555224″…, 299) = 299
24917 read(13, “-ffffffffff601000 r-xp 00000000 “…, 299) = 68
24917 close(13) = 0
24917 shmat(983043, NULL, SHM_RDONLY) = 0x7fc6387df000
24917 shmdt(0x7fc6387df000) = 0
24917 shmat(983043, NULL, 0) = 0x7fc6387df000
24917 shmdt(0x7fc6387df000) = 0
24917 mmap(0x60000000, 33554432, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x60000000
24917 munmap(0x60000000, 33554432) = 0
24917 open(“/dev/shm/ora_DBPROD_983043_0”, O_RDWR|O_SYNC) = 13
24917 mmap(0x60000000, 33554432, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED, 13, 0) = 0x60000000
24917 close(13) = 0
24917 lseek(8, 0, SEEK_CUR) = 851
24917 write(8, “\n*** 2018-10-11T16:06:11.497151-“…, 38) = 38
24917 write(11, “00+3}0Pc\n”, 9) = 9
24917 write(8, ” Shared memory segment allocated”…, 105) = 105
24917 write(8, “\n”, 1) = 1
24917 shmget(IPC_PRIVATE, 4096, IPC_CREAT|IPC_EXCL|0600) = 1015812
24917 shmat(1015812, NULL, 0) = 0x7fc6387df000
24917 open(“/dev/shm/ora_DBPROD_1015812_0”, O_RDWR|O_CREAT|O_SYNC, 0600) = 13
24917 close(13) = 0
24917 open(“/dev/shm/ora_DBPROD_1015812_1”, O_RDWR|O_CREAT|O_SYNC, 0600) = 13
24917 close(13) = 0
24917 open(“/dev/shm/ora_DBPROD_1015812_2”, O_RDWR|O_CREAT|O_SYNC, 0600) = 13
24917 close(13) = 0
24917 open(“/dev/shm/ora_DBPROD_1015812_3”, O_RDWR|O_CREAT|O_SYNC, 0600) = 13
24917 close(13) = 0
24917 open(“/dev/shm/ora_DBPROD_1015812_4”, O_RDWR|O_CREAT|O_SYNC, 0600) = 13
24917 close(13) = 0
24917 open(“/dev/shm/ora_DBPROD_1015812_5”, O_RDWR|O_CREAT|O_SYNC, 0600) = 13
24917 close(13) = 0
24917 open(“/dev/shm/ora_DBPROD_1015812_6”, O_RDWR|O_CREAT|O_SYNC, 0600) = 13
24917 close(13) = 0
24917 open(“/dev/shm/ora_DBPROD_1015812_7”, O_RDWR|O_CREAT|O_SYNC, 0600) = 13
……….

 

 

Conclusion

 

Persistent Memory (PMEM) resides between DRAM and disk storage in the data storage hierarchy. This technology enables byte-addressable updates and does not lose data if power is lost.

The Accelerating Oracle Performance using vSphere Persistent Memory (PMEM) paper examines the performance of Oracle databases using VMware vSphere 6.7 Persistent Memory feature in different modes for below uses cases for

  • Improved performance of Oracle Redo Log using vPMEM Disk-backed vmdks/vPMEM disks in DAX mode
  • Accelerating Performance using Oracle Smart Flash Cache
  • Potential reduction in Oracle Licensing

In the blog article Oracle and vSphere Persistent Memory (PMEM) – vPMEM v/s vPMEMDisk , we demonstrate the performance improvement in Redo log activity when redo log files are placed on vPMEM Disk-backed vmdks/vPMEM disks in DAX mode over redo logs on vPMEMDisk backed vmdks.

With PMem-aware Redis. By adding very low-overhead journaling to make Redis transactions on persistent memory crash consistent, we were able to reduce the crash recovery time,

We are not able to accelerate the Oracle crash recovery process in the same way as Oracle will still use its proprietary crash recovery mechanism. Oracle recreates the SGA granules on every instance startup , in /dev/shm , and hence even if the old SGA granules are present in PMEM , Oracle disregards it for crash recovery purposes.

Unless the Oracle software code is cognizant of the PMEM device and take advantage of the memory persistence , Oracle crash recovery will continue to function as is without any PMEM advantages.

All Oracle on vSphere white papers including Oracle on VMware vSphere / VMware vSAN / VMware Cloud on AWS , Best practices, Deployment guides, Workload characterization guide can be found at Oracle on VMware Collateral – One Stop Shop.

 

 

Posted in Uncategorized

Accelerating Oracle workloads with vSphere 6.7 Guest 1GB Huge Pages – An Investigation

Introduction to Linux Huge Pages

 

  

 

Much has been written and spoken about Linux Huge Page feature.

The Red Hat Documentation explains about the performance benefits of using Huge Pages.

Essentially , Memory is managed in blocks known as pages. CPUs have a built-in memory management unit (MMU) that contains a list of these pages, with each page referenced through a page table entry.

To manage large amounts of memory , we need to either

  • increase the number of page table entries in the MMU OR
  • increase the page size.

The first option is very expensive and results in slow performance as owning to lack of huge page support, the system falls back to slower, software-based memory management, which causes the entire system to run more slowly.

Also reading address mappings from the page table is time-consuming and resource-expensive, so CPUs are built with a cache for recently-used addresses: the Translation Lookaside Buffer (TLB). However, the default TLB can only cache a certain number of address mappings. If a requested address mapping is not in the TLB (that is, the TLB is missed), the system still needs to read the page table to determine the physical to virtual address mapping.

Because of the relationship between application memory requirements and the size of pages used to cache address mappings, applications with large memory requirements are more likely to suffer performance degradation from TLB misses than applications with minimal memory requirements. It is therefore important to avoid TLB misses wherever possible.

The second method is the Linux 2.6 onwards implementation of what is called Huge Pages. Enabling HugePages makes it possible to support memory pages greater than the default (usually 4 KB).  The Huge page support is built on top of multiple page size support that is provided by most modern architectures.  For example, x86 CPUs normally support 4K and 2M (1G if architecturally supported) page sizes, ia64 architecture supports multiple page sizes 4K, 8K, 64K, 256K, 1M, 4M, 16M,256M and ppc64 supports 4K and 16M.

More information on this can be found at ‘RHEL 7 Memory’ , ‘HugeTLBPage’ and ‘Page table’.

Continue reading

Posted in Uncategorized

Oracle RAC on Stretched Clusters for VMware Cloud on AWS – Anti-Affinity within AZ & HA across AZs

Introduction

 

As mentioned in the earlier post , VMware Cloud on AWS is an on-demand service that enables customers to run applications across vSphere-based cloud environments with access to a broad range of AWS services.

Powered by VMware Cloud Foundation, this service integrates vSphere, vSAN and NSX along with VMware vCenter management, and is optimized to run on dedicated, elastic, bare-metal AWS infrastructure. ESXi hosts in VMware Cloud on AWS reside in an AWS availability Zone (AZ) and are protected by vSphere HA.

The paper Migrating Oracle Workloads to VMware Cloud on AWS describes the deployment, migration options along with best practices when migrating Oracle Standalone and Oracle RAC on VMware on-premises (vSphere with traditional Storage or VMware HCI vSAN ) to Stretched Clusters for VMware Cloud on AWS using the approach below

  • Validate functionality of current on-premise RAC setup
  • Migrate DR RAC ‘prddg’ from on-premise Site B to Stretched Cluster for VMware Cloud on AWS
  • Take advantage of the Stretched Cluster for VMware Cloud on AWS using the multi-AZ functionality by
    • Adding new nodes to the migrated DR RAC ‘prddg’
    • Create new Oracle RAC ‘vmcrac’

This post focuses on to effectively provide Site level HA along with Infrastructure level HA to an Oracle RAC on Stretched Clusters for VMware Cloud on AWS using vSphere Tags and Attributes.

 

Continue reading

Posted in Uncategorized

Oracle Database on all-flash vSAN 6.7 Reference Architecture

Customers deploying Oracle Database have requirements such as stringent SLAs, consistent performance, and high availability. It can be a major challenge for organizations to manage data storage in these environments due to these demanding business requirements. Common issues in using traditional storage solutions for business-critical applications include inability to easily scale-up and scale-out, storage inefficiency, complex management, high deployment, and operating costs.

VMware®vSAN™ has been widely adopted as an Hyperconverged Infrastructure (HCI) solution providing a scalable, resilient, and high-performance storage using cost-effective hardware, specifically direct-attached disks in VMware ESXi™hosts. vSAN uses storage policy-based management, which simplifies and automates complex management workflows that exist in traditional enterprise storage systems with respect to configuration and clustering.

To show the continued improvement in VMware vSAN software, we have developed this reference architecture document to demonstrate the consistent application experience by improved Oracle workload performance, scalability, and resynchronization performance.

 

 

This solution addresses the common business challenges that organizations face today in an online transaction processing (OLTP) environment that requires predictable performance. The solution helps customers design and implement optimal configurations specifically for Oracle Database on all-flash vSAN 6.7.

This Reference Architecture can be found here.

Posted in Uncategorized

Oracle and vSphere Persistent Memory (PMEM) – vPMEM v/s vPMEMDisk

In the previous blog post Accelerating Oracle Performance using vSphere Persistent Memory (PMEM)  , we demonstrated how performance of Oracle databases can be improved using VMware vSphere 6.7 Persistent Memory feature in different modes for the uses cases below

  • Improved performance of Oracle Redo Log using vPMEM Disk-backed vmdks/vPMEM disks in DAX mode
  • Accelerating Performance using Oracle Smart Flash Cache
  • Potential reduction in Oracle Licensing

In this blog, we demonstrate the performance improvement in using vPMEM over vPMEMDisk

The additional use case below shows performance improvement in Redo log activity when redo log files are placed on vPMEM Disk-backed vmdks/vPMEM disks in DAX mode over redo logs on vPMEMDisk backed vmdks.

 

Continue reading

Posted in Uncategorized

Accelerating Oracle Performance using vSphere Persistent Memory (PMEM)

Customers have successfully run their business-critical Oracle workloads with high performance demands on VMware vSphere for many years.

Deploying IO-intensive Oracle workloads requires fast storage performance with low latency and resiliency from database failures. Latency, which is a measurement of response time, directly impacts a technology’s ability to deliver faster performance for business-critical applications.

There has been a disruptive paradigm shift in data storage called Persistent Memory (PMEM) that resides between DRAM and disk storage in the data storage hierarchy.

More information about Persistent Memory (PMEM)  and how vSphere 6.7 can take advantage of PMEM technology to accelerate IO-intensive Oracle workloads can be found here.

 

Accelerating Oracle Performance using vSphere Persistent Memory (PMEM) – Reference Architecture

 

The Accelerating Oracle Performance using vSphere Persistent Memory (PMEM) paper examines the performance of Oracle databases using VMware vSphere 6.7 Persistent Memory feature in different modes for redo log-enhanced performance, accelerating flash cache performance and a possibility of reducing Oracle licenses.

 

 

Additional use case : vPMEMDisk versus vPMEM (memory and raw mode)

 

A VM ‘Oracle122-RHEL-PMEM-udev’ was created as a copy of the ‘Oracle122-RHEL-PMEM’ VM , used in the paper.

 

VM Specifications

  • 12 vCPUs and 64GB memory
  • Red Hat 7.4 operating system
  • Oracle database version was 12.2.0.1.0 with Oracle SGA set to 32GB and PGA set to 12GB with Grid Infrastructure and RDBMS binaries installed
  • A single instance database ‘DBPROD’ was created
  • All database-related vmdks were set to Eager Zero thick in Independent Persistent mode to ensure maximum performance with no snapshot capability
  • All database-related vmdks were partitioned using Linux utilities with proper alignment offset and labelled with Oracle ASMLib or Linux udev for device persistence.
  • Oracle ASM ‘DATA_DG’ and ‘REDO_DG’ disk group were created on an All Flash SAN attached storage with external redundancy and configured with default allocation unit (AU) size of 1M.
  • ASM ‘DATA_DG’ and ‘REDO_DG’ disks were presented on different PVSCSI controllers for performance and queue depth purposes.
  • All best practices for Oracle on VMware SDDC was followed as per the ‘Oracle Databases on VMware—Best Practices Guide’ which can be found here

 

Note

  • OEL 7.4 was not compatible with vPMEM mode at the time of writing this paper
  • udev rules were used instead of ASMLIB when using pmem as we ran into disk partitioning issues with vPMEM devices

 

 

 

VM Disk Layout

  • Hard Disk 1 – Operating System
  • Hard Disk 2 – Oracle Binaries
  • Hard Disk 3 – DATA_DG
  • Hard Disk 4 – REDO_DG

 

Workload Generator

This solution primarily uses SLOB TPCC like workload generator to generate heavy batch processing workload on the Oracle database. During this workload generation, Oracle AWR, and Linux SAR reports were used to compare the performance and validate the testing use cases. The Oracle database was restarted after every test case to ensure no blocks or SQLs cached in the SGA.

 

SLOB configuration

  • Database VM with a 2,048GB SLOB schema
  • Workload is purely a 100 percent write to mimic a heavy IO database batch
  • processing workload (SLOB parameter UPDATE_PCT was set to 100).
  • Number of users set to 1 with 0 think time to hit each database with maximum requests concurrently to generate extremely intensive batch workload
  • SLOB parameter SCALE for the workload was set to 1024GB with Oracle SGA set to 32GB
  • SLOB parameter REDO_STRESS for the workload was set to HEAVY
  • SLOB parameter RUN_TIME was set to 30 minutes.

 

Test Cases

Run SLOB against database with Redo log files on

  • REDO_DG ASM disk group backed by All Flash SAN Storage (Baseline)
  • REDO_DG ASM disk group backed by vPMEMDisk
  • REDO_DG ASM disk group backed by vPMEM in raw mode
  • /redolog ext4 File system backed by vPMEM in raw mode with dax option
  • /redolog_dax ext4 File system backed by vPMEM in memory mode with dax option

 

Additional Database Setup

  • 16 Redo log groups , 256 MB , 2 members per group created in the REDO_DG
  • Initialization parameter ‘db_writer_processes’ was set at ‘3’ as the initial run of the workload, being very batch intensive, was waiting on Checkpoint process to complete, and the intention of the test is to demonstrate the reduced wait time on ‘log file switch’ event.

 

Results

AWR reports were collected for all runs and analyzed ad compared for all 5 use cases.

 

 

Analysis

  • Reduction in ‘log file switch completion’ wait times
  • Increase in the amount of work by the workload (Executes (SQL) / sec & Transactions / sec )
  • Impact of log file switches reduced

 

Conclusion

The above test cases using vPMEMdisk and vPMEM mode indicates a reduction in wait times for critical database events e.g ‘log file switch completion’ and at the same time an increase in the amount of work done by the workload.

Deploying IO-intensive Oracle workloads requires fast storage performance with low latency and resiliency from database failures. Latency, which is a measurement of response time, directly impacts a technology’s ability to deliver faster performance for business-critical applications.

Persistent Memory (PMEM) technology enables byte-addressable updates and prevents data loss during power interruptions. Instead of having nonvolatile storage at the bottom with the largest capacity but the slowest performance, nonvolatile storage is now very close to DRAM in terms of performance.

Persistent Memory Performance in vSphere 6.7 paper can be found here.

Posted in Uncategorized