Sunday, July 24, 2011

Who Manages the Exadata Machine?

For organizations that just procured an Exadata machine, one of the big questions is bound to be about the group supporting it. Who should it be - the DBAs, Sys Admins, Network Admins, or some blend of multiple teams?

The conventional Oracle database system is a combination of multiple distinct components - servers, managed by system admins; storage units, managed by SAN admins; network components such as switches and routers, managed by network admins; and, of course, the database itself, managed by the DBAs. Exadata has all those components - servers, storage (as cell servers), infiniband network, ethernet network, flash disks, the whole nine yards; but packaged inside a single physical frame representing a single logical unit - a typical engineered system. (For a description of the components inside the Exadata system, please see my 4-part article series on Oracle Technology Network) None of these conventional technology groups posses the skillsets to the manage all these components. That leads to a difficult but important decision - how the organization should assign the operational responsibilities.

Choices

There are two choices for organizations to assign administrative responsibilities.

  1. Distributed - Have these individual groups manage the respective components, e.g. Sys Admins managing the Linux servers, the storage admins managing the storage cells, network admins managing the network components and finally DBAs managing the database and the cluster.
  2. Consolidated - Create a specialized group - Database Machine Administrator (DMA) and have one of these groups expand the skillset to include the other non-familiar areas.

Each option has its own pros and cons. Let's examine them and see if we can get the right fit for our specific case.

Distributed Management

Under this model each component of Exadata is managed as an independent entity by a group traditionally used to manage that type of infrastructure. For instance, the system admins would manage the Linux OS, overseeing all aspects of it such as creation of users to applying the patches and RPMs. The storage and database would be managed likewise by the specialist teams.

The benefit of this solution is its seeming simplicity - components are managed by their respective specialists without a need for advanced training. The only need for training is for storage, where the Exadata Storage Server commands are new and specific to Exadata.

While this approach seems a nobrainer on surface, it may not be so in reality. Exadata is not just something patched up from these components; it is an engineered system. There is a huge meaning behind that qualifier. These components are not designed to act alone; they are put together to make the entire structure a better database machine. And, note the stress here - not an application server, not a fileserver, not a mail server; not a general purpose server - but a database machine alone. This means the individual components - the compute nodes, the storage servers, the disks, the flashdisk cards and more - are tuned to achieve that overriding objective. Any incremental tuning in any specific component has to  be within the framework of the entire frame; otherwise it may fail to produce the desired result, or worse, produce undesirable result.

For instance the disks where the database resides are attached to the storage cell servers; not the database compute nodes. The cell servers, or Cells run Oracle Enterprise Linux, which is very similar to Red Hat Linux. Under this model of administration, the system admins are responsible for managing the operating system. A system admin looks at the host and determines that it is under tuned since the filesystem cache is very low. In a normal Linux system, that would have been a correct observation; but in Exadata, the database is in ASM and a filesystem cache is less important. On the other hand, the Cells need the memory to place the Storage Indexes on the disk contents. Placing a large filesystem cache not only produce nothing to help the filesystem; but actually hurt the performance for the paging of Storage Indexes.

This is just one example of how the engineered systems are closely interrelated. Assuming they are separate and assigning multiple groups with different skillsets may not work effectively.

Database Machine Administrator

This is leads to the other approach - making a single group responsible for the entire frame from storage to the database. The single group would be able to understand the impact of the changes in one component to the overall effectiveness of the rack and will be in a better position to plan and manage. The single role that performs the management of Exadata is known as Database Machine Administrator (DMA).

I can almost hear the questions firing off inside your brain. The most likely question probably is whether it is even possible to have a single skillset that encompasses storage, system, database and network.

Yes, it definitely is. Remember, the advantages of an engineered system do not stop at being a carefully coordinated individual components. Another advantage is the lack of controls in those components. There are less knobs to turn on each component in an Exadata system. Take for instance the Operating System. There are two types of servers - the compute nodes and the cells. In the cells, the activity performed by a system admin is severely limited - almost to the point of being none. On the compute nodes, the activities are limited as well. The only allowable activities are - setting up users, setting up email relays, possibly setting up an NFS mount and handful of more. This can easily be done by a non-expert. One does not have to a System Admin to manage the servers.

Consider storage, the other important component. Traditionally storage administrators perform critical functions such as adding disks, carving out LUNs, managing replication for DR and so on. These functions are irrelevant in Exadata. For instance, the disks are preallocated in Exadata, the LUNs are created at installation time, there is no replication since the DR is by Data Guard which at the Oracle database level. One need not be a storage expert to the perform the tasks in Exadata. Additionally the Storage Admins are experts in the specific brand of storage, e.g. EMC VMax or IBM XiV. In Exadata, the storage is different from all the other brands your storage admins may be managing. They have to learn about the Exadata storage anyway; so why not have someone else, specifically the DMA learn?

Consider Network. In Exadata the network components are very limited since it is only for the components inside the rack. This reduces the flexibility of the configuration compared to a regular general purpose network configuration. the special kind of hardware used in Exadata - Infiniband - requires some special skills which the network ops folks may have to learn anyway. So, why not the DMAs instead of them? Besides, Oracle already provides a lot of tools to manage this layer.

That leaves the most visible component - the database which is, after all, the heart and soul of Exadata. This layer is amenable to a considerable degree of tuning and the depth of skills in this layer is vital to managing Exadata effectively. Transferring the skills needed here to a non-DBA group or individual is difficult, if not impossible. This makes the DBA group the most natural choice for evolving into the DMA role after absorbing the relevant other skills. The other skills are not necessarily at par with the administrator of the respective components. For instance the DMA does not need to be a full scale Linux system admin; but just needs to know a few relevant concepts, commands and tools to perform the job well. Network management is Exadata is a fraction of the skills expected from a network admin. The storage management in cell servers are new to any group; so the DMA will find that as easy as any other group, if not easier.

By understanding the available knobs on all the constituent components of Exadata, the DMA can be better prepared to be an effective administrator of the Exadata system; not by divvying up the activities to individual groups which are generally autonomous. The advantages are particularly seen when troubleshooting or patching Exadata. Hence, I submit here for your consideration - a new role called DMA (Database Machine Administrator) for the management of Exadata. The role should have the following skillsets:

60% Database Administration
20% Cell Administration
15% Linux Administration
5% Miscellaneous (Infiniband, network, etc.)

I have written an article series on Oracle Technology Network - Linux for Oracle DBAs. This 5-part article series has all the commands an concepts the Oracle DBA should understand about Linux. I have also written a 4 part article series - Commanding Exadata - for DBAs to learn the 20% cell administration. With these two , you will have everything you need to be a DMA. Scroll down to the bottom of this page and click on "Collection of Some of My Very Popular Web Articles" to locate all these articles and more.

Summary

In this blog entry, I argued for creating  a single role to manage the Exadata system instead of multiple groups managing individual parts. Here are the reasons in a nutshell:


  1. Exadata is an engineered system where all the components play collaboratively instead of as islands. Managing them separately may be ineffective and detrimental.
  2. The support organizations of components such as Systems, storage, DBA, etc. in an organizations are designed with a generic purpose in mind. Exadata is not generic. Its management needs unprecedented close coordination among various groups which may be new to the organization and perhaps difficult to implement.
  3. The needed skillsets are mostly database centric; other components have very little to manage.
  4. These other skills are easy to add to the DBA skills making the natural transition to the DMA role.

Best of luck in becoming a DMA and implementing Exadata.
Post a Comment

Translate