Sunday, July 24, 2011

Who Manages the Exadata Machine?

For organizations that just procured an Exadata machine, one of the big questions is bound to be about the group supporting it. Who should it be - the DBAs, Sys Admins, Network Admins, or some blend of multiple teams?

The conventional Oracle database system is a combination of multiple distinct components - servers, managed by system admins; storage units, managed by SAN admins; network components such as switches and routers, managed by network admins; and, of course, the database itself, managed by the DBAs. Exadata has all those components - servers, storage (as cell servers), infiniband network, ethernet network, flash disks, the whole nine yards; but packaged inside a single physical frame representing a single logical unit - a typical engineered system. (For a description of the components inside the Exadata system, please see my 4-part article series on Oracle Technology Network) None of these conventional technology groups posses the skillsets to the manage all these components. That leads to a difficult but important decision - how the organization should assign the operational responsibilities.

Choices

There are two choices for organizations to assign administrative responsibilities.

  1. Distributed - Have these individual groups manage the respective components, e.g. Sys Admins managing the Linux servers, the storage admins managing the storage cells, network admins managing the network components and finally DBAs managing the database and the cluster.
  2. Consolidated - Create a specialized group - Database Machine Administrator (DMA) and have one of these groups expand the skillset to include the other non-familiar areas.

Each option has its own pros and cons. Let's examine them and see if we can get the right fit for our specific case.

Distributed Management

Under this model each component of Exadata is managed as an independent entity by a group traditionally used to manage that type of infrastructure. For instance, the system admins would manage the Linux OS, overseeing all aspects of it such as creation of users to applying the patches and RPMs. The storage and database would be managed likewise by the specialist teams.

The benefit of this solution is its seeming simplicity - components are managed by their respective specialists without a need for advanced training. The only need for training is for storage, where the Exadata Storage Server commands are new and specific to Exadata.

While this approach seems a nobrainer on surface, it may not be so in reality. Exadata is not just something patched up from these components; it is an engineered system. There is a huge meaning behind that qualifier. These components are not designed to act alone; they are put together to make the entire structure a better database machine. And, note the stress here - not an application server, not a fileserver, not a mail server; not a general purpose server - but a database machine alone. This means the individual components - the compute nodes, the storage servers, the disks, the flashdisk cards and more - are tuned to achieve that overriding objective. Any incremental tuning in any specific component has to  be within the framework of the entire frame; otherwise it may fail to produce the desired result, or worse, produce undesirable result.

For instance the disks where the database resides are attached to the storage cell servers; not the database compute nodes. The cell servers, or Cells run Oracle Enterprise Linux, which is very similar to Red Hat Linux. Under this model of administration, the system admins are responsible for managing the operating system. A system admin looks at the host and determines that it is under tuned since the filesystem cache is very low. In a normal Linux system, that would have been a correct observation; but in Exadata, the database is in ASM and a filesystem cache is less important. On the other hand, the Cells need the memory to place the Storage Indexes on the disk contents. Placing a large filesystem cache not only produce nothing to help the filesystem; but actually hurt the performance for the paging of Storage Indexes.

This is just one example of how the engineered systems are closely interrelated. Assuming they are separate and assigning multiple groups with different skillsets may not work effectively.

Database Machine Administrator

This is leads to the other approach - making a single group responsible for the entire frame from storage to the database. The single group would be able to understand the impact of the changes in one component to the overall effectiveness of the rack and will be in a better position to plan and manage. The single role that performs the management of Exadata is known as Database Machine Administrator (DMA).

I can almost hear the questions firing off inside your brain. The most likely question probably is whether it is even possible to have a single skillset that encompasses storage, system, database and network.

Yes, it definitely is. Remember, the advantages of an engineered system do not stop at being a carefully coordinated individual components. Another advantage is the lack of controls in those components. There are less knobs to turn on each component in an Exadata system. Take for instance the Operating System. There are two types of servers - the compute nodes and the cells. In the cells, the activity performed by a system admin is severely limited - almost to the point of being none. On the compute nodes, the activities are limited as well. The only allowable activities are - setting up users, setting up email relays, possibly setting up an NFS mount and handful of more. This can easily be done by a non-expert. One does not have to a System Admin to manage the servers.

Consider storage, the other important component. Traditionally storage administrators perform critical functions such as adding disks, carving out LUNs, managing replication for DR and so on. These functions are irrelevant in Exadata. For instance, the disks are preallocated in Exadata, the LUNs are created at installation time, there is no replication since the DR is by Data Guard which at the Oracle database level. One need not be a storage expert to the perform the tasks in Exadata. Additionally the Storage Admins are experts in the specific brand of storage, e.g. EMC VMax or IBM XiV. In Exadata, the storage is different from all the other brands your storage admins may be managing. They have to learn about the Exadata storage anyway; so why not have someone else, specifically the DMA learn?

Consider Network. In Exadata the network components are very limited since it is only for the components inside the rack. This reduces the flexibility of the configuration compared to a regular general purpose network configuration. the special kind of hardware used in Exadata - Infiniband - requires some special skills which the network ops folks may have to learn anyway. So, why not the DMAs instead of them? Besides, Oracle already provides a lot of tools to manage this layer.

That leaves the most visible component - the database which is, after all, the heart and soul of Exadata. This layer is amenable to a considerable degree of tuning and the depth of skills in this layer is vital to managing Exadata effectively. Transferring the skills needed here to a non-DBA group or individual is difficult, if not impossible. This makes the DBA group the most natural choice for evolving into the DMA role after absorbing the relevant other skills. The other skills are not necessarily at par with the administrator of the respective components. For instance the DMA does not need to be a full scale Linux system admin; but just needs to know a few relevant concepts, commands and tools to perform the job well. Network management is Exadata is a fraction of the skills expected from a network admin. The storage management in cell servers are new to any group; so the DMA will find that as easy as any other group, if not easier.

By understanding the available knobs on all the constituent components of Exadata, the DMA can be better prepared to be an effective administrator of the Exadata system; not by divvying up the activities to individual groups which are generally autonomous. The advantages are particularly seen when troubleshooting or patching Exadata. Hence, I submit here for your consideration - a new role called DMA (Database Machine Administrator) for the management of Exadata. The role should have the following skillsets:

60% Database Administration
20% Cell Administration
15% Linux Administration
5% Miscellaneous (Infiniband, network, etc.)

I have written an article series on Oracle Technology Network - Linux for Oracle DBAs. This 5-part article series has all the commands an concepts the Oracle DBA should understand about Linux. I have also written a 4 part article series - Commanding Exadata - for DBAs to learn the 20% cell administration. With these two , you will have everything you need to be a DMA. Scroll down to the bottom of this page and click on "Collection of Some of My Very Popular Web Articles" to locate all these articles and more.

Summary

In this blog entry, I argued for creating  a single role to manage the Exadata system instead of multiple groups managing individual parts. Here are the reasons in a nutshell:


  1. Exadata is an engineered system where all the components play collaboratively instead of as islands. Managing them separately may be ineffective and detrimental.
  2. The support organizations of components such as Systems, storage, DBA, etc. in an organizations are designed with a generic purpose in mind. Exadata is not generic. Its management needs unprecedented close coordination among various groups which may be new to the organization and perhaps difficult to implement.
  3. The needed skillsets are mostly database centric; other components have very little to manage.
  4. These other skills are easy to add to the DBA skills making the natural transition to the DMA role.

Best of luck in becoming a DMA and implementing Exadata.

15 comments:

Anonymous said...

Great post about an important topic, Arup!
What you describe seems to be indeed a concern for many shops that implement Exadata: Who is in charge for all the different layers?

It seems (from my experience) that most customers take the "DBAs-manage-it-all-approach", like you suggest in your article.

The idea that DBAs develop skills in the storage & network area has 2 well known evangelists, by the way: Harald van Breederode & Joel Goodman have published this paper about what they call "DBA 2.0": http://dbatrain.files.wordpress.com/2009/01/dba20.pdf
Seems to be analogous to your "DMA" - it is just not limited to administering Exadata :-)

Allan said...

Oracle have sold the Exadata to DBAs as something they own and control and a DMA role will be very attractive to DBAs.

A good DBA should, at least, understand the OS/Storage and Network layers which underpin their database. Would they always have time or ability to learn and manage these things?

For many corporates, the DMA role would undermine regulatory and security frameworks, eg SOX compliance and separation of roles and responsibilities.

So the Exadata DMA role sounds great and the skillset is something that an individual should strive towards, but it is unlikely to come about in larger IT organizations unless there is a change in IT management thinking.

Arup Nanda said...

Uwe - thank you for your comments. I was not aware of the arguments by Joel and Harald. But I do differ in that argument. In a typical database installation, I support a separation of duties. Why? There is just too much for a DBA to understand every little nuance about the specific model of storage and network gear. For instance the current evolution of EMC is VMAX which has subtle differences from it's predecessor - the DMX. Since storage is used not just for database but for everything in an organization, it's not fair to ask the DBA to manage that. Exadata is different. The storage is for database only and is integrated into the machine; so there it makes sense for the New role to manage it. It has to be a new role; hence I called it DMA.

Arup Nanda said...

Allan - thanks for comments. There is a big difference between knowing and managing. A good DBA should know about SAN concepts, layout, etc.; but manage them? I don't think so. For instance I do care about how the LUNs are spread out over spindles in my EMC SAN; but I don't know the exact symcli commandsto make that happen. It's a general purpose activity; not specific to any database. However in Exadata, it is not. There it's part of the machine. So, there it makes a whole lof of sense to learn.

About regulatory compliance - that depends upon specific situation. I haven't seen a case where that has become an issue. The separation of duties has been mostly there for convenience reasons; not regulatory. But YMMV.

Maneesh Kalra said...

Hi there. Nice blog. You have shared useful information. Keep up the good work! This blog is really interesting and gives good details. Network Rack, Racks Manufacturer in India.

Maneesh Kalra said...

Hello, I love reading through your blog, I wanted to leave a little comment to support you and wish you a good continuation. Wish you best of luck for all your best efforts. Kiosk Manufacturers, Server Racks.

Porus Homi Havewala (પોરસ હોમી હવેવાલા) said...

Perhaps DBMA would be a better word rather than DMA.

Maneesh Kalra said...

Hello, I love reading through your blog, I wanted to leave a little comment to support you and wish you a good continuation. Wish you best of luck for all your best efforts. Kiosk Manufacturers, Kiosk Manufacturers in India.

Anonymous said...

I looked some post ,but I am not really appreciated from there,then I saw your post information and I saw this is actually what I want. Keep up the great job.
ssd recovery

Satya Thirumani said...

Hi Arup,

I always read your blog posts and learn from it.
Thanks

-Cheers,
Saya
http://satya-exadata.blogspot.com/2011/07/cellclicommandsexadata.html

Anonymous said...

We just procured Exadata quarter RAC and facing the question what support model should we have - traditional or new DBMA/DMA approach.

After reading your blog, I am more inclined to propose DMA/DBMA approach. Thanks.

Anonymous said...

Arup,
Excellent blog! If the organization requires seperation of duties between two groups, what would be your recommendation? For example DBA's and Network/Storage Admins and would both need to have root access?

Thanks

Arup Nanda said...

@Anonymous on Sep 6th:

If separation is a must, DBAs do not need the root access.

Tariq Farooq - Oracle ACE Director said...

After all my Exadata travels as a consultant, I'm in a 110% agreement with what Arup said.

I've never quite understood the need for masking the root password as a security measure from the DBAs even in the traditional world; after all, DBAs can see/play the real data (Way more dangerous) than the OS root passwords; Yet they are kept away from even the "OS" oracle password (The SUDO-oracle access is necessary for audit purposes, but once in a while we need the actual Oracle password as well e.g. Cluster Managed Database Services setup in OEM)? All in the name of not-enough-OS knowledge??? Doesn't make any sense, because most good DBAs have a pretty good handle on OS-level command-line mastery.

Also, now with the pervasive use of sophisticated tools like OEM12cR2, sometimes, we as DBAs/DMAs sometimes have to wait for days for "an" SA to become available and plug-in the root/nm2/ilom etc. passwords for run running ExaChks, OEM monitoring agents etc.

Another thing i've noted is that, there is a severe lack of
"Exadata System Admins" out there. Most of the SAs/Storage Admins inherit Exadata as part of their regular job duties really but, don't much what to do with it (Very few organizations train their SAs on Exadata).

Summary: Kudos to you Arup for writing this post; this should be made Best Practice by Oracle.

Tariq

Giovanni Carlo said...

My head spin reading this very technical jargons he he

Gantry Crane

Translate