Monday 3 May 2010

Hey EMC? Rhubarb! I say Rhubarb!

Now contrary to popular belief I don't actually like calling out specific things or people, but I'm afraid I feel compelled to call out something. I shall do this using a recent CloudCamp by-law convention established by @swardley on Feb 8th 2010...

For the publication of the "Savings from their IT Investments" press release I hear-by shout a claim of "Rhubarb!" firmly in the face of EMC and their PR team.

Now to be clear my call of "Rhubarb!" refers mainly to the specific 3rd paragraph, namely :-
Optimizing performance and cost reduction for Oracle Database 11g deployments with EMC Symmetrix® V-Max™ or EMC CLARiiON® CX-4 networked storage systems and EMC FAST to automatically adjust storage tiering as Oracle workloads change. This results in up to 30 percent lower acquisition cost and up to 45 percent lower operating cost in hardware, power, cooling and management over a three year period.
Additionally @sakacc also makes reference to this in the 5th paragraph of his blog post here re :-
Oh – we also showed how using Fully Automated Storage Tiering, Solid State storage, and deduplication using Data Domain we could lower the acquisition cost by 30%, and the operating costs of Oracle 11g by 45% – while delivering equal or better performance.
I'd like to believe these papers were created by engineers with good intent and sound principles, and I can certainly vouch for Chad having those qualities, and it is positive to see such materials being published, however...

I do notice that the two percentage claims above have subtle but key differences in their content - the first stating storage technology acquisition and (mainly) environmental costs, the second referencing 'operating costs of Oracle 11g'.

I've read the PDF linked at the start of the above paragraphs, and re-read it, and I'm afraid I can't find any reference to a number of things :-

  • The actual cost savings percentage values called out in the press release text
  • The baseline that has been used for the comparison savings statements above
  • Costs of components, technologies & software used - clearly showing the standard FoC and additional cost elements for each option (eg cost of FAST, cost of SSD etc)
  • Any form of ROI and TCO models used to underpin the two statements in the document making reference to 'improving TCO'
  • The list of assumptions and/or pre-requisites used in the models re savings (re facilities costs, FTE costs, frequency & duration of administration or change tasks, support & deployment operating model RACI etc)
  • Savings for other alternatives (eg using thin-provisioning & wide-striping on the vmax rather than FAST, using a CX instead of a VMax, using both a CX & VMax etc)
  • Impacts of other technologies (eg Oracle on NFS, Database compression, Flash Cache etc)
  • Impacts of the use of two independent arrays and ASM "NORMAL REDUNDANCY" for the replication of data between the arrays (ie rather than the cost of SRDF), and then how the independent operation of FAST on each array may impact performance predictability under disk failure situations. (with different read & write IO profiles potentially driving the independent policy engines into different decisions)
  • Some context definitions as to what capacity the paper regards as 'large' Oracle databases (eg 10TB? 50TB? 150TB? etc) (page 5 of pdf)
  • How this is impacted by the capacity of the databases being handled by the storage structure?
  • Impacts of the rate of database capacity growth on the proposed model (eg a rapidly expanding DB, when the DBs exceed the capacities of each tier etc)
  • The RPO & RTO requirements & assumptions for these databases, and impacts / sensitivity of changing them
  • The support operational SLA context for the database services (eg permissible time for response, resolution and return to normal operation, performance & risk after an incident)
  • Any specific performance related numbers (eg actual response ms times required for SLA, throughput/sec, IOPs etc), and how they change (given we don't know much about the transaction being measured)
  • No details to the NFR (performance, resiliency, capacity etc) impacts during the FAST migrations, or indeed how long the migrations took to complete each time
  • A slight puzzle for me is the use of Raid 5 3+1 in the vMax, given all the previous statements from EMC re default preference for Raid 6
  • The on page 30 of the doc the SSD raid type is stated as R5 7+1, versus R5 3+1 of other disk types (impacts on performance, risk & perf deg during rebuild?) but at the bottom of figure 25 & middle of figure 26 the screen-shots appear to show the SSD as being R5 3+1?
  • I'd be interested in seeing how including the pool allowed to use the SSD would change the performance - given a lot of Oracle DB perf issues come from redo log bottlenecks
  • The more likely use-case for me would be migration of storage within a single database's data structure (eg some of the database data-files active some inactive etc hence some on SSD, some on FC and some on SATA for the same DB)
  • As a minor side point it doesn't mention the version of ASM being used
  • Sadly as usual with EMC there are no actual details of the reference benchmarks being used for the workload simulation - really would be good if they published their suite of benchmark tools
  • The paper makes reference to "we used an internal EMC performance analysis tool that shows a 'heat map' of the drive utilisation of the array back end" - why isn't this confidence validation view & tool available to all customers as part of all array type standard software? (after all at worse it would be a sales tool to help justify the purchase of the additional FAST software licences???)

Now I'm not saying the reports are wrong, but what I'm rather saying is that I think they are incomplete, and appear to have little or no direct linkage to the claims being made in their name by the PR teams and certainly don't give a full context picture. This additional context & information is needed for enterprises to get sufficient comfort in the technologies to perform their own benefit opportunity & impact assessments.

So marks out of 10? So far 6/10 with a caution for "unjustified PR marketing abuse" - but very willing to review the score upon someone pointing out what I may have missed, a revised draft, justification of claims & clarification...

4 comments:

  1. Hi

    Wanted a chance to respond to your commentary.

    On one hand, you make good points -- need for more details, etc. However, the summary documents we publish externally are solution overviews, and don't attempt to get to the level of detail you would need.

    What you're probably looking for is the PSG -- the Practitioner's Solution Guide -- a much longer document with specific implementation questions and tradeoffs, actual results of testing, etc. -- e.g. most of the stuff you found lacking in the summary document.

    There's a separate policy question around how much we publish externally vs. share in the context of a specific engagement, but that's another debate entirely, I think.

    As always, it's great that you're holding EMC (and everyone else) to higher standards. In this specific case, though, what you're asking for does -- it just wasn't published externally.

    -- Chuck

    ReplyDelete
  2. Chuck,

    Thanks for the comment - two further points :-

    1) You make savings claims in the PR etc that aren't explained, mentioned or referenced in the papers - in my mind you either need to redact the claims, or justify them

    2) I'm a firm believer that a little knowledge is a dangerous one, I'm happy (and glad you do publish docs) for a summary doc to exist but firmly believe within it you should reference the more detailed version & publish it

    I've talked at length previously with your PSG teams about the need for context in doc sets, particularly around SLAs & operating models.

    My real issue is that EMC have PR'd a savings soundbite but given no backup justification for it at all...

    Cheers

    Ian

    ReplyDelete
  3. I couldn't agree more -- we (and everyone else) need to publish *all* our findings when claims are made, and not just the excerpts.

    Keep agitating, please -- it's something many of us believe should be done sooner than later.

    -- Chuck

    ReplyDelete
  4. It's Rubarb for sure. Much like the Mystery of the Unexplained Clariion Sales Numbers. Just ask Mellor....

    ReplyDelete