Category Archives: OVM
Hard Partitioning in OracleVM
RAC Add Node
I am working on a virtualization first company for one of the largest Data Warehouse projects in the world for the financial industry to rapidly clone RAC clusters and databases in matter of hours. We can provision up to 3 or 4 node RAC clusters in less than couple of hours and then as needed add nodes to the cluster with ease.
1. On target, create /u01/app/oraInventory (empty) and owned by oracle:dba
2. On the target node, please make sure /u01/app/11.2.0 directory is created and also empty. Make sure that it is owned by oracle:dba
3. Run the following addnode.ksh on an existing RAC node with parameters replaced as oracle
4. You will be instructed to run root.sh on target node as root
The following are the caveats:
1. cluvfy is buggy for 22.214.171.124. It had bugs in 126.96.36.199 also. I remember we had to set IGNORE_PREADDNODE_CHECKS to Y back in 11.2.02, and it looks like we still have to.
2. /u01/app/oraInventory has to be pre-created and owned by oracle:dba and be completely empty
3. /u01/app/11.2.0 has to be owned by oracle:dba. The /u01/app/11.2.0/grid directory cannot exist … even as an empty directory. The addnode.sh script attempts to do a “mkdir grid” as oracle, and if it fails, it dies.
4. It died on me twice at “crsctl start listener -n “target_node” … have to create default listener first
5. I did not use the -noCopy method … This working example will copy the Grid Home from source RAC node to the new RAC node. I don’t think it is a big deal … added about 5 minutes to the overall time (see below).
[root@rhel59drb db_backup]# tail -f /u01/app/11.2.0/grid/install/root_rhel59drb_2013-05-04_21-18-28.log
Configure Oracle Grid Infrastructure for a Cluster … succeeded
+ASM1 > crs_stat -t -v
Oracle VM vs VMware Technical Discussion
The primary audience of this white paper is aimed at Oracle DBAs and technical infrastructure teams that need to support enterprise virtual platforms. The topic of discussions may be above and beyond a DBA’s normal conversational lingo but the DBA will learn the key buzzwords that relate to both vendors. Oracle DBAs will learn how virtual infrastructures need to be configured and tuned to properly handle Oracle workloads.
DBAs and architects will learn technical internals for VMware and OVM infrastructures. The DBAs will be able to effectively communicate with their infrastructure leads and understand what they are getting and what they want out of virtualization. Our goal is to disseminate key best practices for virtualization. DBAs should be able go back to their virtualization administrators (vAdmins) and confirm whether these best practices are being practiced. Also, important performance metrics for virtualization will be revealed. Your measure of success in virtualizing Oracle Databases on VMware or Oracle VM should depend on pre-established performance metrics.
VIRTUALIZING ORACLE DATABASES AND APPLICATIONS
Hypervisors today are getting faster with less and less overhead. When you look at benchmarks being published today on running Tier One databases and applications the overhead can get down to 6% or less. This low level of overhead means you can run 80-90% of all database servers in a Virtual Machine. Critical systems that require ten of thousands of IOPs, high IO throughput, and excessive CPU requirements may need more than the 6% of overhead. If your application suffers from performance issues today, your best bet is to stay on the physical servers until your performance issues are isolated and resolved. The rest of the typical business critical applications and database servers can run successfully in a virtual environment.
RAPIDLY PROVISION ORACLE ON VIRTUALIZED INFRASTRUCTURE
Imagine a world where your system administrators can provision a fully functional Linux server that is patched with all the up-to-date kernel parameters, updated device drivers, and updated configurations in one hour. The time that you make the request to the time that you get access to a server that has a fully configured Red Hat 5/6 or Oracle Linux 5/6 environment is within one hour. On top of all this, the build is perfect every time. This should be a reality for most companies today. Imagine providing a fully patched Oracle database on 188.8.131.52 with PSU 5 (January 2013 PSU) applied to your customers in 1 hour. There is no reason why this cannot be accomplished today with the infrastructure that is provided by VMware and Oracle. Imagine provisioning RAC clusters in a matter of hours. Imagine being able to provide a fully patched 2 node/3 node /4 node Grid Infrastructure with ASM and a fully patched database within one day. This presentation does not go into the secret sauce of being able to do this but will lead you in the right direction. Oracle and VMware provide the means to be able to provision even the most complicated RAC infrastructure in one day. We no longer spend weeks and even months to setup our RAC environments.
CREATE GOLDEN IMAGE TEMPLATES
The concept of creating a golden image applies at all levels of the stack. In the end, we need to create a golden image virtual machine template. Before we can create a golden image VM template, we need to create a golden image OS. This does not come over night but can easily be established. There has to be lot of collaboration between the system administrators as to standards and policies. Furthermore, someone has to be the “owner” of the templates to make sure all the standard build is applied to the golden image template. As we mature within the organization, we can build automation to simplify the build process and parts of the builds that require manual intervention. The level of automation will dictate how long it takes to provision the Linux VM. Obviously, the more you automate, the less time it will take. As DBAs, we will want to focus on creating golden images of the database eco-system. We need to create a golden image Grid Infrastructure stack. We also need to create a golden image database software stack. Finally, we need to create a golden image database to deploy to all the environments. We can automate all of the above components to simplify and reduce the amount of time to provision Oracle databases.
BUILDING AN ENTERPRISE VIRTUAL PLATFORM
Setting up a VirtualBox, VMware Fusion or VMware Workstation VM is pretty simple. However there is a big difference in the skill set required to set up a bare metal hypervisor for running a POC and/or benchmarks. It’s then another skill level to design, configure and implement an enterprise virtual platform for running Tier One platforms. They key to building an enterprise virtual platform is to follow best practices and reference architectures. The levels of best practices that have to be followed include:
• Validate virtualization and software configurations with vendor hardware compatibility lists.
• Follow recommended reference architectures.
• Follow virtualization vendor’s best practices, deployment guides and workload characterizations.
• Review storage vendor recommendations.
• Validate internal best practices for configuring and managing VMs.
As we build out the enterprise virtual platform, standards will need to be created and tightly controlled. Process and procedures for virtual machine deployments will also play a big factor in how successful your virtualization journey becomes.
WHY VIRTUALIZE ORACLE AND HADOOP ENVIRONMENTS
• Virtual Servers offer significant advantages over Physical Servers.
• Enabling Oracle or Hadoop as a service in a public or private cloud.
• Cloud providers are making it easy to deploy platforms for POCs, dev and test environments.
• Running a Consistent, Highly Reliable Hardware Environment.
• Standardizing on a Single Common Hardware Platform (software stack).
• Virtualization is a natural step towards the cloud.
• Cloud and virtualization vendors are offering elastic solutions.
These virtualization features offer a lot of additional functionality to Oracle database servers, applications and Hadoop environments.
We cannot go over all the virtualization features in this paper. We do plan on reviewing all of the terms and features in our presentation. Also at the presentation we will provide the subtle differences between the two vendors.
VMOTION / LIVE MIGRATION
vMotion / Live Migration, by far, is one of the biggest benefits of a virtualization infrastructure. With this feature enabled, we can migrate an active VM to another host machine without any downtime or disruptions while maintaining application services to users. Granted the application may experience a slight degradation in performance, there will be no data loss during the few minutes needed to move a VM to another host machine. It will be completely transparent to the applications that the live migration (vMotion) occurred. Imagine if you lost the network card on one of the host machines and need to take the server down for maintenance. In the absence of Oracle VM or VMware, you would experience a complete outage in a non-virtualized world. If you happen to be on a RAC environment, you would run your databases in a reduced capacity. In the virtual world however, we would simply move your database server VM to another host machine, perform our maintenance, reboot your host machine and let the database server VM migrate back. While this is happening, you would never know that it happened.
HIGH AVAILABILITY (HA)
With virtualization, we automatically adopt what is known to be HA in the virtualization world. If the host machine crashes for any reason, the VM can failover automatically to a surviving host machine in the cluster. With HA, some companies may be able to forgo on RAC licenses if they are strictly leveraging RAC for high availability. If customers can withstand a 10-15 minute outage for the VM to re-start on a surviving host machine, you maybe able to eliminate your RAC licenses.
DISTRIBUTED RESOURCE MANAGEMENT (DRS)
Leveraging the vMotion / Live Migration infrastructure, we can evenly load balance the work load of every host machine in the virtualization cluster. If one host machine becomes over-loaded, we can move the VM or VMs to a less loaded host machine. This automatically happens without the users experiencing any perceived outages. We can establish affinity and anti-affinity rules to even move VMs together with other VMs. Again, we should not be afraid to fully leverage this technology.
KEY LINKS FOR ORACLE VM SOFTWARE
In the VMware world, your vAdmins will take care of the needed software. On the Oracle side however, the DBAs will more than likely need to provide the information to the vAdmin or perform the tasks themselves. If you are new to the Oracle VM stack, you will discover that downloading the software will be your first hurdle. The following software products can be downloaded from Oracle EDelivery.
Oracle VM Templates provide customers the ability to rapidly and easily deploy a fully patched/configured Oracle virtual machine or multiple virtual machines for RAC deployments. The Oracle VM Templates contains the complete Oracle software stack plus the operating system and related software infrastructure. Lot of the Guest VMs are available from Oracle’s E-Delivery website but the latest patched versions are available from support.oracle.com and you will need a valid CSI and contract to download them. These templates can save customers days or weeks or even months depending on whether you leverage the complete Oracle stack. We can download RAC templates, Oracle 12c Cloud Control Enterprise Manager templates, Siebel templates or even Oracle E-Business Suite templates. For the latest, review the MOS note Pre-Built Grid Infrastructure and Oracle RAC Templates For Oracle VM [ID 1185244.1]. Downloading the latest Oracle VM templates can reduce your VM builds exponentially.
KEY LINKS FOR BEST PRACTICES
Lot of the best practices for VMware applies to Oracle VM. Obviously, there are specific best practices when it comes to features that are specific to either of the products. For example, we need to create separate interfaces on the VM host (ESXi host or Oracle VM Server) to segment off management related network traffic (i.e. management related traffic to maintain a network heartbeat or the traffic to perform live migrations (vMotion in VMware)). At a minimum, each physical host needs to have 4 physical network interface cards. 6 Network interface cards will be highly recommended. We will create a bonded network interfaces for the following network workloads:
1. 2 NICs bonded for the public network for all oracle database related traffic
2. 2 NICs bonded for oracle private network between the RAC clusters
3. 2 NICs bonded for communication between the ESXi or Oracle VM Server host machines
All the best practices that are applicable at the VM Guest level apply to both VMware and Oracle VM. For example, we want to enable jumbo frames on the Guest VM. We also want to setup hugepages and disable NUMA at the Guest VM level. In general, we also do not want to over-commit memory or CPUs for production environments. For databases that fit well for consolidation, we can consider over-committing memory or CPUs.
For additional information for best practices for VMware, please read the following articles.
http://info.vmware.com/content/12581_VirtApps_index?src=socmed_BCAblog&xyz= Oracle Databases on VMware – RAC Workload Characterization Study http://www.vmware.com/files/pdf/partners/oracle/Oracle_Databases_on_VMware_-_Workload_Characterization_Study.pdf Oracle Databases on VMware – RAC Deployment Guide http://www.vmware.com/files/pdf/partners/oracle/vmware-oracle-rac-deploy-guide.pdf High Availability Guide http://www.vmware.com/files/pdf/partners/oracle/Oracle_Databases_on_VMware_-_High_Availability_Guidelines.pdf
vCloud Suite and vCloud Networking and Security vCloud Editions
vCloud Networking and Security
VMware Tech Resource Center (Videos, Whitepapers, Docs)
Miscellaneous A high level whitepaper on virtualizing Business Critical Apps on VMware
Deployment Guide, Reference Architecture, Customer case studies and white papers
Oracle Databases on VMware – Understanding Support and License :
VMware Network I/O Control: Architecture, Performance and Best Practices
Esxtop and vscsi Stats
Memory Management vSphere 5
Resource Mgmt vSphere 5
Achieving a Million IOPS in a single VM with vSphere5
MXNET3 was designed with improving performance in mind.
See, VMware KB 1001805: http://kb.vmware.com/selfservice/documentLinkInt.do?micrositeID=null&externalID=1001805
Performance Evaluation of VMXNET3 Virtual Network Device can be found at:
Network I/O Latency in vSphere5
Preferred BIOS settings (always double check with hardware vendor,
Oracle Database on vSphere Deployment Tips –
SCSI Queue Depth – Controlling LUN queue depth throttling in VMware ESX/ESXi
Monitor disk latency at three distinct layers of the device or HBA, the kernel or ESX hypervisor and the guest or virtual machine.
PVSCSI Storage Performance
Snapshot limitations and best practices to minimize problems
http://kb.vmware.com/kb/1025279 Jumbo frames VMXNET3
The vSphere 4 CPU scheduler
Some excellent storage links from Chris Sakac (EMC) and Vaughn Stewart (NetApp) VNX and vSphere Techbook
VMAX and vSphere Techbook
Isilon and vSphere Best Practices Guide
Storage I/O Fairness
ABOUT THE AUTHOR
Charles Kim is an Oracle ACE Director, an Oracle Certified DBA, and a Certified RAC Expert. Charles specializes in Exadata, RAC, and Virtualization (VMware and Oracle VM) and authored three books: 1. Oracle Database 11g New Features for DBA and Developers 2. Linux Recipes for Oracle DBAs 3. Oracle Data Guard 11g Handbook Charles holds certifications in Oracle, VMware, Red Hat Linux, and Microsoft and has over 21 years of Oracle experience on mission and business critical databases. Charles presents regularly at local, regional, national and international Oracle conferences including IOUG Collaborate, VMware World, and Oracle OpenWorld on topics of RAC, ASM, Linux Best Practices, Data Guard Best Practices, VMware virtualization, Oracle VMware virtualization, and 7×24 High Availability Considerations. Charles is the technical editor of the Automatic Storage Management book by Oracle Press and blogs regularly at http://blog.dbaexpert.com and http://oravm.com.
NFS Options for Performance
Most DBAs do not understand NFS mount options. As RAC, Infrastructure, and Virtualization DBAs, we need to know what the NFS options mean and closely follow industry trends as to what options are adopted into best practices. We need to provide the correct NFS options to our system administrators and be able to explain what they mean. The following post is directly from Oracle MOS Support: How to increase the NFS performace with NFS options: [ID 397194.1]
|Use rw for data that users need to modify. In order for you to mount a directory read/write, the NFS server must export it read/write.
Use ro for data you do not want users to change. A directory that is automounted from several servers should be read-only, to keep versions identical on all servers.
|Specify suid if you want to allow mounted programs that have setuid permission to run with the permissions of their owners, regardless of who starts them. If a program with setuid permission is owned by root, it will run with root permissions, regardless of who starts it.
Specify nosuid to protect your system against setuid programs that may run as root and damage your system.
|Specify hard if users will be writing to the mounted directory or running programs located in it. When NFS tries to access a hard-mounted directory, it keeps trying until it succeeds or someone interrupts its attempts. If the server goes down, any processes using the mounted directory hang until the server comes back up and then continue processing without errors. Interruptible hard mounts may be interrupted with CTRL-C or kill (see the intr option, later).
Specify soft if the server is unreliable and you want to prevent systems from hanging when the server is down. When NFS tries to access a soft-mounted directory, it gives up and returns an error message after trying retrans times (see the retrans option, later). Any processes using the mounted directory will return errors if the server goes down.
|Specify intr if users are not likely to damage critical data by manually interrupting an NFS request. If a hard mount is interruptible, a user may press [CTRL]-C or issue the kill command to interrupt an NFS mount that is hanging indefinitely because a server is down.
Specify nointr if users might damage critical data by manually interrupting an NFS request, and you would rather have the system hang while the server is down than risk losing data between the client and the server.
|Specify fg for directories that are necessary for the client machine to boot or operate correctly. If a foreground mount fails, it is retried again in the foreground until it succeeds or is interrupted. All automounted directories are mounted in the foreground; you cannot specify the bg option with automounted directories.
Specify bg for mounting directories that are not necessary for the client to boot or operate correctly. Background mounts that fail are retried in the background, allowing the mount process to consider the mount complete and go on to the next one. If you have two machines configured to mount directories from each other, configure the mounts on one of the machines as background mounts. That way, if both systems try to boot at once, they will not become deadlocked, each waiting to mount directories from the other. The bg option cannot be used with automounted directories.
|Specify devs if you are mounting device files from a server whose device files will work correctly on the client. The devs option allows you to use NFS-mounted device files to read and write to devices from the NFS client. It is useful for maintaining a standard, centralized set of device files, if all your systems are configured similarly.
Specify nodevs if device files mounted from a server will not work correctly for reading and writing to devices on the NFS client. The nodevs option generates an error if a process on the NFS client tries to read or write to an NFS-mounted device file.
|The timeout, in tenths of a second, for NFS requests (read and write requests to mounted directories). If an NFS request times out, this timeout value is doubled, and the request is retransmitted. After the NFS request has been retransmitted the number of times specified by the retrans option (see below), a soft mount returns an error, and a hard mount retries the request. The maximum timeo value is 30 (3 seconds).
Try doubling the timeo value if you see several server not responding messages within a few minutes. This can happen because you are mounting directories across a gateway, because your server is slow, or because your network is busy with heavy traffic.
|The number of times an NFS request (a read or write request to a mounted directory) is retransmitted after it times out. If the request does not succeed after n retransmissions, a soft mount returns an error, and a hard mount retries the request.
Increase the retrans value for a directory that is soft-mounted from a server that has frequent, short periods of down time. This gives the server sufficient time to recover, so the soft mount does not return an error.
|The number of times the NFS client attempts to mount a directory after the first attempt fails. If you specify intr, you can interrupt the mount before n retries. However, if you specify nointr, you must wait until n retries have been made, until the mount succeeds, or until you reboot the system.
If mounts are failing because your server is very busy, increasing the retry value may fix the problem.
|The number of bytes the NFS client requests from the NFS server in a single read request.
If packets are being dropped between the client and the server, decrease rsize to 4096 or 2048. To find out whether packets are being dropped, issue the nfsstat -rc command at the HP-UX prompt. If the timeout and retrans values returned by this command are high, but the badxid number is close to zero, then packets are being dropped somewhere in the network.
|The number of bytes the NFS client sends to the NFS server in a single write request.
If packets are being dropped between the client and the server, decrease wsize to 4096 or 2048. To find out whether packets are being dropped, issue the nfsstat -rc command at the HP-UX prompt. If the timeout and retrans values returned by this command are high, but the badxid number is close to zero, then packets are being dropped somewhere in the network.
|O (Overlay mount)
default: not specified
|Allows the file system to be mounted over an existing mount point, making the underlying file system inaccessible. If you attempt to mount a file system over an existing mount point without the -O option, the mount will fail with the error device busy.
Caution: Using the -O mount option can put your system in a confusing state. The -O option allows you to hide local data under an NFS mount point without receiving any warning. Local data hidden beneath an NFS mount point will not be backed up during regular system backups.
On HP-UX, the -O option is valid only for NFS-mounted file systems. For this reason, if you specify the -O option, you must also specify the -F nfs option to the mount command or the nfs file system type in the /etc/fstab file.
default: not specified
|If the file system is mounted read-only, this option remounts it read/write. This allows you to change the access permissions from read-only to read/write without forcing everyone to leave the mounted directory or killing all processes using it.|
(default: not specified)
|If specified, this option prevents the NFS client from caching attributes for the mounted directory.
Specify noac for a directory that will be used frequently by many NFS clients. The noac option ensures that the file and directory attributes on the server are up to date, because no changes are cached on the clients. However, if many NFS clients using the same NFS server all disable attribute caching, the server may become overloaded with attribute requests and updates. You can also use the actimeo option to set all the caching timeouts to a small number of seconds, like 1 or 3.
If you specify noac, do not specify the other caching options
(default: not specified)
|If specified, this option suppresses fresh attributes when opening a file.
Specify nocto for a file or directory that never changes, to decrease the load on your network.
|The maximum number of seconds a directory’s attributes are cached on the NFS client. When this timeout period expires, the client flushes its attribute cache, and if the attributes have changed, the client sends them to the NFS server.
For a directory that rarely changes or that is owned and modified by only one user, like a user’s home directory, you can decrease the load on your network by setting acdirmax=120 or higher.
|The minimum number of seconds a directory’s attributes are cached on the NFS client. If the directory is modified before this timeout expires, the timeout period is extended by acdirmin seconds.
For a directory that rarely changes or that is owned and modified by only one user, like a user’s home directory, you can decrease the load on your network by setting acdirmin=60 or higher.
|The maximum number of seconds a file’s attributes are cached on the NFS client. When this timeout period expires, the client flushes its attribute cache, and if the attributes have changed, the client sends them to the NFS server.
For a file that rarely changes or that is owned and modified by only one user, like a file in a user’s home directory, you can decrease the load on your network by setting acregmax=120 or higher.
|Setting actimeo to n seconds is equivalent to setting acdirmax, acdirmin, acregmax, and acregmin to n seconds.
Set actimeo=1 or actimeo=3 for a directory that is used and modified frequently by many NFS clients. This ensures that the file and directory attributes are kept reasonably up to date, even if they are changed frequently from various client locations.
Set actimeo=120 or higher for a directory that rarely or never changes.
If you set the actimeo value, do not set the acdirmax, acdirmin, acregmax, or acregmin values.
|The version of the NFS protocol to use. By default, the local NFS client will attempt to mount the file system using NFS version 3. If the NFS server does not support version 3, the file system will be mounted using version 2.
If you know that the NFS server does not support version 3, specify vers=2, and you will save time during the mount, because the client will not attempt to use version 3 before using version 2.
default: not specified
|Forces a newly created file in the mounted file system to inherit the group ID of the parent directory.
By default, a newly created file inherits the effective group ID of the calling process, unless the GID bit is set on the parent directory. If the GID bit is set, the new file inherits the group ID of the parent directory.
Oracle VM 3.2 is public beta
The new features and enhancements in Oracle VM Release 3.2.1 Beta include:
- Support for Oracle VM Server for SPARC: Oracle VM Manager can now be used to discover SPARC T-Series servers running Oracle VM Server for SPARC, and perform virtual machine life cycle management.
- Oracle VM Command Line Interface (CLI): The new Oracle VM Command Line Interface can be used to perform the same functions as the Oracle VM Manager Web Interface, such as managing all your server pools, servers and guests. The CLI commands can be scripted and run in conjunction with the Web Interface, thus bringing more flexibility to help you deploy and manage an Oracle VM environment. See the Oracle VM Command Line Interface User’s Guide for information on using the CLI.
- Usability Improvements: There are a number of enhancements to help improve user experience when using Oracle VM Manager, such as configuring the accessibility options, monitoring the overall health and status of your server virtualization environment with health tab, multi-select of objects, search for objects, the ability to present repository to server pools in addition to individual servers, rediscovering all Oracle VM servers, setting preferences for recurring jobs, and setting UI timeout.
- Updated Dom0 Kernel in Oracle VM Server for x86: The Dom0 kernel in Oracle VM Server for x86 has been updated so that it is now the same Oracle Unbreakable Enterprise Kernel 2 (UEK2) as used in Oracle Linux, for complete binary compatibility with drivers supported in Oracle Linux.
- MySQL Database Support: MySQL Database is used as the bundled database for the Oracle VM Manager management repository for simple installations. Support for an existing Oracle Database is still included within the installer so that you can perform a custom installation to take advantage of your existing infrastructure.