NFS Options for Performance

Most DBAs do not understand NFS mount options.  As RAC, Infrastructure, and Virtualization DBAs, we need to know what the NFS options mean and closely follow industry trends as to what options are adopted into best practices.  We need to provide the correct NFS options to our system administrators and be able to explain what they mean.  The following post is directly from Oracle MOS Support:  How to increase the NFS performace with NFS options:  [ID 397194.1]

rw (read/write)
or
ro (read-only)

(default: rw)

Use rw for data that users need to modify. In order for you to mount a directory read/write, the NFS server must export it read/write.

Use ro for data you do not want users to change. A directory that is automounted from several servers should be read-only, to keep versions identical on all servers.

suid
or
nosuid

(default: suid)

Specify suid if you want to allow mounted programs that have setuid permission to run with the permissions of their owners, regardless of who starts them. If a program with setuid permission is owned by root, it will run with root permissions, regardless of who starts it.

Specify nosuid to protect your system against setuid programs that may run as root and damage your system.

hard
or
soft

(default: hard)

Specify hard if users will be writing to the mounted directory or running programs located in it. When NFS tries to access a hard-mounted directory, it keeps trying until it succeeds or someone interrupts its attempts. If the server goes down, any processes using the mounted directory hang until the server comes back up and then continue processing without errors. Interruptible hard mounts may be interrupted with CTRL-C or kill (see the intr option, later).

Specify soft if the server is unreliable and you want to prevent systems from hanging when the server is down. When NFS tries to access a soft-mounted directory, it gives up and returns an error message after trying retrans times (see the retrans option, later). Any processes using the mounted directory will return errors if the server goes down.

intr
or
nointr

(default: intr)

Specify intr if users are not likely to damage critical data by manually interrupting an NFS request. If a hard mount is interruptible, a user may press [CTRL]-C or issue the kill command to interrupt an NFS mount that is hanging indefinitely because a server is down.

Specify nointr if users might damage critical data by manually interrupting an NFS request, and you would rather have the system hang while the server is down than risk losing data between the client and the server.

fg (foreground)
or
bg (background)

(default: fg)

Specify fg for directories that are necessary for the client machine to boot or operate correctly. If a foreground mount fails, it is retried again in the foreground until it succeeds or is interrupted. All automounted directories are mounted in the foreground; you cannot specify the bg option with automounted directories.

Specify bg for mounting directories that are not necessary for the client to boot or operate correctly. Background mounts that fail are retried in the background, allowing the mount process to consider the mount complete and go on to the next one. If you have two machines configured to mount directories from each other, configure the mounts on one of the machines as background mounts. That way, if both systems try to boot at once, they will not become deadlocked, each waiting to mount directories from the other. The bg option cannot be used with automounted directories.

devs
or nodevs

(default: devs)

Specify devs if you are mounting device files from a server whose device files will work correctly on the client. The devs option allows you to use NFS-mounted device files to read and write to devices from the NFS client. It is useful for maintaining a standard, centralized set of device files, if all your systems are configured similarly.

Specify nodevs if device files mounted from a server will not work correctly for reading and writing to devices on the NFS client. The nodevs option generates an error if a process on the NFS client tries to read or write to an NFS-mounted device file.

timeo=n

(default=7)

The timeout, in tenths of a second, for NFS requests (read and write requests to mounted directories). If an NFS request times out, this timeout value is doubled, and the request is retransmitted. After the NFS request has been retransmitted the number of times specified by the retrans option (see below), a soft mount returns an error, and a hard mount retries the request. The maximum timeo value is 30 (3 seconds).

Try doubling the timeo value if you see several server not responding messages within a few minutes. This can happen because you are mounting directories across a gateway, because your server is slow, or because your network is busy with heavy traffic.

retrans=n

(default=4)

The number of times an NFS request (a read or write request to a mounted directory) is retransmitted after it times out. If the request does not succeed after n retransmissions, a soft mount returns an error, and a hard mount retries the request.

Increase the retrans value for a directory that is soft-mounted from a server that has frequent, short periods of down time. This gives the server sufficient time to recover, so the soft mount does not return an error.

retry=n

(default=1)

The number of times the NFS client attempts to mount a directory after the first attempt fails. If you specify intr, you can interrupt the mount before n retries. However, if you specify nointr, you must wait until n retries have been made, until the mount succeeds, or until you reboot the system.

If mounts are failing because your server is very busy, increasing the retry value may fix the problem.

rsize=n

(default=8192)

The number of bytes the NFS client requests from the NFS server in a single read request.

If packets are being dropped between the client and the server, decrease rsize to 4096 or 2048. To find out whether packets are being dropped, issue the nfsstat -rc command at the HP-UX prompt. If the timeout and retrans values returned by this command are high, but the badxid number is close to zero, then packets are being dropped somewhere in the network.

wsize=n

(default=8192)

The number of bytes the NFS client sends to the NFS server in a single write request.

If packets are being dropped between the client and the server, decrease wsize to 4096 or 2048. To find out whether packets are being dropped, issue the nfsstat -rc command at the HP-UX prompt. If the timeout and retrans values returned by this command are high, but the badxid number is close to zero, then packets are being dropped somewhere in the network.

O (Overlay mount)

default: not specified

Allows the file system to be mounted over an existing mount point, making the underlying file system inaccessible. If you attempt to mount a file system over an existing mount point without the -O option, the mount will fail with the error device busy.

Caution: Using the -O mount option can put your system in a confusing state. The -O option allows you to hide local data under an NFS mount point without receiving any warning. Local data hidden beneath an NFS mount point will not be backed up during regular system backups.

On HP-UX, the -O option is valid only for NFS-mounted file systems. For this reason, if you specify the -O option, you must also specify the -F nfs option to the mount command or the nfs file system type in the /etc/fstab file.

remount

default: not specified

If the file system is mounted read-only, this option remounts it read/write. This allows you to change the access permissions from read-only to read/write without forcing everyone to leave the mounted directory or killing all processes using it.
noac

(default: not specified)

If specified, this option prevents the NFS client from caching attributes for the mounted directory.

Specify noac for a directory that will be used frequently by many NFS clients. The noac option ensures that the file and directory attributes on the server are up to date, because no changes are cached on the clients. However, if many NFS clients using the same NFS server all disable attribute caching, the server may become overloaded with attribute requests and updates. You can also use the actimeo option to set all the caching timeouts to a small number of seconds, like 1 or 3.

If you specify noac, do not specify the other caching options

nocto

(default: not specified)

If specified, this option suppresses fresh attributes when opening a file.

Specify nocto for a file or directory that never changes, to decrease the load on your network.

acdirmax=n

(default=60)

The maximum number of seconds a directory’s attributes are cached on the NFS client. When this timeout period expires, the client flushes its attribute cache, and if the attributes have changed, the client sends them to the NFS server.

For a directory that rarely changes or that is owned and modified by only one user, like a user’s home directory, you can decrease the load on your network by setting acdirmax=120 or higher.

acdirmin=n

(default=30)

The minimum number of seconds a directory’s attributes are cached on the NFS client. If the directory is modified before this timeout expires, the timeout period is extended by acdirmin seconds.

For a directory that rarely changes or that is owned and modified by only one user, like a user’s home directory, you can decrease the load on your network by setting acdirmin=60 or higher.

acregmax=n

(default=60)

The maximum number of seconds a file’s attributes are cached on the NFS client. When this timeout period expires, the client flushes its attribute cache, and if the attributes have changed, the client sends them to the NFS server.

For a file that rarely changes or that is owned and modified by only one user, like a file in a user’s home directory, you can decrease the load on your network by setting acregmax=120 or higher.

actimeo=n

(no default)

Setting actimeo to n seconds is equivalent to setting acdirmax, acdirmin, acregmax, and acregmin to n seconds.

Set actimeo=1 or actimeo=3 for a directory that is used and modified frequently by many NFS clients. This ensures that the file and directory attributes are kept reasonably up to date, even if they are changed frequently from various client locations.

Set actimeo=120 or higher for a directory that rarely or never changes.

If you set the actimeo value, do not set the acdirmax, acdirmin, acregmax, or acregmin values.

vers=n

(default=3)

The version of the NFS protocol to use. By default, the local NFS client will attempt to mount the file system using NFS version 3. If the NFS server does not support version 3, the file system will be mounted using version 2.

If you know that the NFS server does not support version 3, specify vers=2, and you will save time during the mount, because the client will not attempt to use version 3 before using version 2.

grpid

default: not specified

Forces a newly created file in the mounted file system to inherit the group ID of the parent directory.

By default, a newly created file inherits the effective group ID of the calling process, unless the GID bit is set on the parent directory. If the GID bit is set, the new file inherits the group ID of the parent directory.

Oracle VM 3.2 is public beta

The new features and enhancements in Oracle VM Release 3.2.1 Beta include:

  • Support for Oracle VM Server for SPARC: Oracle VM Manager can now be used to discover SPARC T-Series servers running Oracle VM Server for SPARC, and perform virtual machine life cycle management.
  • Oracle VM Command Line Interface (CLI): The new Oracle VM Command Line Interface can be used to perform the same functions as the Oracle VM Manager Web Interface, such as managing all your server pools, servers and guests. The CLI commands can be scripted and run in conjunction with the Web Interface, thus bringing more flexibility to help you deploy and manage an Oracle VM environment. See the Oracle VM Command Line Interface User’s Guide for information on using the CLI.
  • Usability Improvements: There are a number of enhancements to help improve user experience when using Oracle VM Manager, such as configuring the accessibility options, monitoring the overall health and status of your server virtualization environment with health tab, multi-select of objects, search for objects, the ability to present repository to server pools in addition to individual servers,  rediscovering all Oracle VM servers,  setting preferences for recurring jobs, and setting UI timeout. 
  • Updated Dom0 Kernel in Oracle VM Server for x86: The Dom0 kernel in Oracle VM Server for x86 has been updated so that it is now the same Oracle Unbreakable Enterprise Kernel 2 (UEK2) as used in Oracle Linux, for complete binary compatibility with drivers supported in Oracle Linux. 
  • MySQL Database Support: MySQL Database is used as the bundled database for the Oracle VM Manager management repository for simple installations. Support for an existing Oracle Database is still included within the installer so that you can perform a custom installation to take advantage of your existing infrastructure.
For immediate downloads, please click on the URL below:

http://www.oracle.com/technetwork/server-storage/vm/downloads/ovm-early-access-1743261.html

The Only Photo Allowed

At the Oracle ACE Director’s Annual Briefing, this is the only photo of the presenter’s slides that we are allowed to take.

 

The only allowed photo at the Oracle ACE Director Briefing

Oracle on VMware with Thomas Kurian

ACE Directors at the 2012 Annual Briefing at Oracle Headquarters ask challenging questions to Thomas Kurian, Executive Vice President at Oracle Corporation.
 
Kurian acknowledges that the perception is that Oracle does not support their products on VMware.  Kurian assured all the ACE Directors that it is supported but not certified.  Official letters of support were given to hundreds of companies.  Continuing the discussion about certification, just like any other OS or hardware, Oracle does not certify OVM either.  
 
One of the ACE Directors told Kurian that lot of sales reps say that Oracle on VMware is NOT supported.  Kurian responded that sales people say a lot of things, and he can’t get all of the sales reps in a room and make them not say those things.  In their defense, he has heard  VMware sales people says that Oracle is certified on VMware.  Then he addressed the ACE Directors and asked, why would Oracle sales reps like VMware?  VMware is taking money out of their pockets.  VMware is pushing for consolidating customer  environments, and they are losing money. 
 
Kurian addressed Oracle’s sub-capacity licensing.  He stressed that only hard partitions are accepted because they are 100% audit-able.  Even Solaris zones is not considered a hard partition thus sub-capacity licensing on Solaris is not accepted.

Charles Kim’s Upcoming Presentations at OOW 2012

If you are going to be at Oracle OpenWorld this year, please stop by and say Hi ..

Here’s all the sessions that I will be a presenter or a panelist at:

UGF4410
The Perfect Marriage: Oracle Exadata with Sun ZFS Storage Appliance
Session ID: UGF4410
Sunday 30-SEP-2012 9:00AM
Moscone West – 2018

UGF7700
Session Title: Oracle on Oracle VM: Expert Panel
Venue / Room: Moscone West – 2012
Date and Time: Sunday – 9/30/12, 12:30 – 14:00

UGF6511
Database Performance Tuning: Get the Best out of Oracle Enterprise Manager 12c Cloud Control
Sunday, Sep 30, 2:15 PM – 3:15 PM
Moscone West – 2011

CON8435
Expert Customer Panel:
Exadata Data Protection Best Practices
Session ID: CON8435
10/1/12 (Monday) 12:15 PM
Moscone South – 252

 

Charles Kim Presentation at OOW 2012

ASM Diskgroup

ASM Disk Group

 

ASM Disk Group Configuration

Everyone should be leveraging ASMLIB instead of using block devices to create our ASM disk groups    

Proper ASM configuration and standardization and following best practices is just as important in a virtualized environment as it is in a bare metal environment            

First, create ASMLIB disks with oracleasm

  • sudo to root
  • cd /etc/init.d 
  • ./oracleasm createdisk DATA101_DISK000 /dev/oracle/DATA101_disk000p1
    • Repeat for each disk
  • On other RAC nodes
    • ./oracleasm scandisks
    • ./oracleasm listdisks


List of available disks on April 29, 2012
cd /dev/oracle
lrwxrwxrwx 1 root root 8 Apr 28 16:22 DATA501_disk009p1 -> ../dm-85

lrwxrwxrwx 1 root root 9 Apr 28 16:22 DATA501_disk003p1 -> ../dm-105
lrwxrwxrwx 1 root root 9 Apr 28 16:22 DATA101_disk003p1 -> ../dm-100
lrwxrwxrwx 1 root root 8 Apr 28 16:22 DATA101_disk001p1 -> ../dm-99
lrwxrwxrwx 1 root root 9 Apr 28 16:22 DATA101_disk002p1 -> ../dm-102
lrwxrwxrwx 1 root root 9 Apr 28 16:22 DATA501_disk005p1 -> ../dm-110
lrwxrwxrwx 1 root root 9 Apr 28 16:22 DATA101_disk004p1 -> ../dm-101
lrwxrwxrwx 1 root root 9 Apr 28 16:22 DATA101_disk000p1 -> ../dm-104
lrwxrwxrwx 1 root root 9 Apr 28 16:22 DATA501_disk006p1 -> ../dm-111
lrwxrwxrwx 1 root root 9 Apr 28 16:22 DATA501_disk008p1 -> ../dm-107
lrwxrwxrwx 1 root root 9 Apr 28 16:22 DATA501_disk004p1 -> ../dm-112
lrwxrwxrwx 1 root root 9 Apr 28 16:22 DATA501_disk000p1 -> ../dm-108
lrwxrwxrwx 1 root root 9 Apr 28 16:22 DATA501_disk002p1 -> ../dm-109
lrwxrwxrwx 1 root root 9 Apr 28 16:22 DATA501_disk001p1 -> ../dm-103
lrwxrwxrwx 1 root root 9 Apr 28 16:22 DATA501_disk007p1 -> ../dm-106

Naming Convention Legend for Disk Groups

  • Diskgroup names will be DATA101 or PF101 for RAID 10 disk groups
  • Diskgruop names will DATA501 or PF501 for RAID 5 disk groups
Naming Convention Legend for Disks
  • pd = production data
  • pf = production fast recovery area(fra)
  • dd = would be development data
  • df = would be development fra
  • 101 = raid 10 first disk group
  • 501 = raid 05 first disk group
  • And _diskxxx can be disk000 to disk999


Modify /etc/sysconfig/oracleasm (on each node)

As root:  Make changes to the following lines:
# ORACLEASM_SCANORDER: Matching patterns to order disk scanning

ORACLEASM_SCANORDER=”dm-“

# ORACLEASM_SCANEXCLUDE: Matching patterns to exclude disks from scan
ORACLEASM_SCANEXCLUDE=”sd”
  Important Notes:

  • Only use the partitioned disk when creating ASMLIB disks
  • The partitioned disk will have p1, p2, etc. at the end of the device name
  • After you scan the disk, you should see an entry in /proc/partitions
  • Do NOT use /dev/oracle devices
  • Instead use /dev/mapper devices

RAID 10
——-
[root@dllprdorl01 tmp]# cat ora_asm.txt

/etc/init.d/oracleasm createdisk DATA101_DISK000 /dev/mapper/DATA101_disk000p1
/etc/init.d/oracleasm createdisk DATA101_DISK001 /dev/mapper/DATA101_disk001p1
/etc/init.d/oracleasm createdisk DATA101_DISK002 /dev/mapper/DATA101_disk002p1
/etc/init.d/oracleasm createdisk DATA101_DISK003 /dev/mapper/DATA101_disk003p1
/etc/init.d/oracleasm createdisk DATA101_DISK004 /dev/mapper/DATA101_disk004p1

RAID 5
——
/etc/init.d/oracleasm createdisk DATA501_DISK000 /dev/mapper/DATA501_disk000p1
/etc/init.d/oracleasm createdisk DATA501_DISK001 /dev/mapper/DATA501_disk001p1
/etc/init.d/oracleasm createdisk DATA501_DISK002 /dev/mapper/DATA501_disk002p1
/etc/init.d/oracleasm createdisk DATA501_DISK003 /dev/mapper/DATA501_disk003p1
/etc/init.d/oracleasm createdisk DATA501_DISK004 /dev/mapper/DATA501_disk004p1
/etc/init.d/oracleasm createdisk DATA501_DISK005 /dev/mapper/DATA501_disk005p1
/etc/init.d/oracleasm createdisk DATA501_DISK006 /dev/mapper/DATA501_disk006p1
/etc/init.d/oracleasm createdisk DATA501_DISK007 /dev/mapper/DATA501_disk007p1
/etc/init.d/oracleasm createdisk DATA501_DISK008 /dev/mapper/DATA501_disk008p1
/etc/init.d/oracleasm createdisk DATA501_DISK009 /dev/mapper/DATA501_disk009p1



ASM Disk Group Information

  • First, we will set our Allocation Unit (AU) to 4MB in size
  • Second, we will use ‘ORCL:*’ disks instead of block devices when creating our new disk groups
SQL> alter system set asm_diskstring=’/dev/oracle’,’ORCL:PD*’;

 
System altered.
 
Add the following to the init+ASM1.ora on each node
For automatic mount of diskgroups
asm_diskgroups=’DATA03′,’DATA60′,’FRA03′,’FRA60′,’DATA101′,’DATA501′ 


#asm_diskstring=’/dev/oracle’
asm_diskstring=’/dev/oracle’,’ORCL:PD*’
 
For the time being, manually mount the diskgroups on each node:
SQL> alter system set asm_diskstring=’/dev/oracle’,’ORCL:PD*’;

System altered.

SQL> alter diskgroup DATA101 mount;
Diskgroup altered.

SQL> alter diskgroup DATA501 mount;
Diskgroup altered.
 
 
Creating ASM Disk Groups

RAID 10 DATA Disk Group

+ASM1 > cat cr_DATA101.sql

create diskgroup DATA101 external redundancy disk ‘ORCL:DATA101_DISK000’,
‘ORCL:DATA101_DISK001’,
‘ORCL:DATA101_DISK002’,
‘ORCL:DATA101_DISK003’,
‘ORCL:DATA101_DISK004’
ATTRIBUTE ‘au_size’ = ‘4M’,
‘compatible.rdbms’ = ‘11.1’,
‘compatible.asm’ = ‘11.1’;

RAID 5 DATA Disk Group
+ASM1 > cat cr_DATA501.sql

create diskgroup DATA501 external redundancy disk ‘ORCL:DATA501_DISK000’,
‘ORCL:DATA501_DISK001’,
‘ORCL:DATA501_DISK002’,
‘ORCL:DATA501_DISK003’,
‘ORCL:DATA501_DISK004’,
‘ORCL:DATA501_DISK005’,
‘ORCL:DATA501_DISK006’,
‘ORCL:DATA501_DISK007’,
‘ORCL:DATA501_DISK008’,
‘ORCL:DATA501_DISK009’
ATTRIBUTE ‘au_size’ = ‘4M’,
‘compatible.rdbms’ = ‘11.1’,
‘compatible.asm’ = ‘11.1’;

Physical to Virtual (P2V) Migration with Oracle VM

One easy way to migrate from the physical bare metal world to the virtual world is to use the physical to virtual conversion utility (p2v utility).  The P2V conversion utility is included as part of the Oracle VM Server CD.  

At a high level, the P2V conversion utility will create the vm.cfg configuration file, convert the disks on the physical hardware to virtual disk images, and replicates the virtual disk images to the server pool’s repository.  

Leveraging the Oracle VM Server CD, at the boot prompt, simply type:  “linux p2v”

You can leverage the p2v conversion utility as part of a kickstart image.

The Evolution of the DBA

Kind of DBA

Timeline

CLI DBA
Early 90’s DBAs
GUI DBA
Late 90’s and Dot Com
Google DBA
Dot Com and 2000’s
iDBA
Dot Com, IOUG iDBA Master Curriculum
RAC DBAs
2000+ after 9.2 (but major spike with 10.2)
DMA
2010+ Database Machine Administrator
vDBA / vRAC DBA
2010+
Evolving role of a DBA in the virtual world
Cloud DBA
2011+

OCFS2 for Oracle VM

OCFS2 is a cluster file system for Linux, which allows multiple nodes (Oracle VM Servers) to access the same disk at the same time. OCFS2, which provides both performance and HA, is used in many applications that are cluster-aware or that have a need for shared file system facilities. With Oracle VM, OCFS2 ensures that Oracle VM Servers belonging to the same server pool access and modify resources in the shared repositories in a controlled manner.

The OCFS2 software includes the core file system, which offers the standard file system interfaces and behavioral semantics and also includes a component which supports the shared disk cluster feature. The shared disk component resides mostly in the kernel and is referred to as the O2CB cluster stack. It includes:

  • A disk heartbeat to detect live servers

  • A network heartbeat for communication between the nodes

  • A Distributed Lock Manager (DLM) which allows shared disk resources to be locked and released by the servers in the cluster

OCFS2 also offers several tools to examine and troubleshoot the OCFS2 components. For detailed information on OCFS2, see the OCFS2 documentation at:

http://oss.oracle.com/projects/ocfs2/documentation/

When you create a server pool, you specify:

  • Server pool name and description

  • A virtual IP address

  • Whether or not to activate the cluster

  • A server pool file system for the global heartbeat and other cluster information

During server pool creation, the server pool file system specified for the new server pool is accessed and formatted as an OCFS2 file system. This formatting creates several management areas on the file system including a region for the global disk heartbeat. Oracle VM formats the server pool file system as an OCFS2 file system whether the file system is accessed by the Oracle VM Servers as an NFS share, a FC LUN or iSCSI LUN.

Sample vm.cfg file

A simple example of a configuration file to create a guest follows:

disk = [ ‘file:/mnt/el4u5_64_hvm//system.img,hda,w’ ] memory=4096
vcpus=2
name=”el4u5_64_hvm”

vif = [ ‘ ‘ ] #By default no n/w interfaces are configured. E.g: A default hvm install will have the line as vif=[ ‘type=ioemu,bridge=xenbr0’ ]
builder = “hvm”
device_model = “/usr/lib/xen/bin/qemu-dm”

vnc=1 vncunused=1

apic=1
acpi=1
pae=1
serial = “pty” # enable serial console

on_reboot = ‘restart’ on_crash = ‘restart’

For RAC impleentation, we have to add an exclamation mark at the end of the line as shown below:
# xen config file example for RAC Guest Domain
name = “vmrac1″
memory = “8192″
disk = [ 
‘phy:/dev/mapper/mpath3p1,xvda,w!’,
‘phy:/dev/mapper/mpath4p1,xvdb,w!’,
‘phy:/dev/mapper/mpath5p1,xvdc,w!’,
 ]
vif = [
‘mac=00:16:3E:00:00:08, bridge=xenbr0’,
‘mac=00:16:3E:10:A5:96, bridge=xenbr1’,
 ]
vfb = [“type=vnc,vncunused=1”]
uuid = “3d6f1de4-626c-e02a-42a1-458c9c17e728”
bootloader=”/usr/bin/pygrub”
vcpus=8
on_reboot   = ‘restart’
on_crash    = ‘restart’