GrumpyStorage: 2009-09

Monday, 28 September 2009

Oracle 11gR2 - Flash Cache

Ok so those who followed Oracle's Exadata v2 pitch will have seen something mentioned in the specs re FlashCache and it's use of SSD storage. There's little major stuff around but some brief info can be found at :-

http://www.dba-oracle.com/t_flash_cache.htm

http://technology.amis.nl/blog/6092/11gr2-flash-cache

Now the part that is of particular interest to me is, if or when, this feature is made available as part of the normal Oracle 11gR2 release train.

This slide from http://www.flickr.com/photos/fenng/3883028359/in/photostream/ appears to show that the FlashCache is interfaced through a file location, thus one assumes it could be stored on any device that is file-interface accessible (eg an array based SSD or a host based SSD).

Now this is where it gets interesting for me, in that in a similar way to OpenZFS L2Arc , a local server SSD could be used as a high speed L2 SGA extension without putting any data at risk.

Server SSDs are supposedly not in the same league as array EFDs, but ignoring that (be it real or not), the price point of SSD in servers (compared to that of those in arrays) combined with this very easy use case model (both setup and ongoing mngt very simple) and lack of risk of data loss (re these use cases), certainly makes it a very intriguing proposition and something on my 'benchmark' and TCO evaluation list very shortly.

My test case would be to take an x86 server running OpenSolaris and place 2x256GB SSDs into it and then run Oracle Swingbench & Orion with :- no SSD usage, 1 for L2Arc, 2 for L2Arc, 1 for FlashCache, 2 for FlashCache, 1 for L2Arc & 1 for FlashCache. Also interested to test 2xFlashCache with ASM Vs 2xL2Arc without ASM. Intrigued to see what the performance impacts will be :)

Naturally I'm also keen to understand & compare these costs & benefits against those of Oracle InMemoryDatabase (eg TimesTen) and array based SSD.

My personal suspicion is that using the SSDs for L2Arc & FlashCache (if / when available) will deliver good benefits cases, and then once the storage array firmware has achieved it's required SSD mngt & usability maturity we'll be able to use array based SSDs in addition to server based for even further combined benefits.

I'd be interested to hear about other people's experiences, thoughts & plans in this area...

Sunday, 27 September 2009

SSD - possibly a use?

Well those who know me may well be staggered by the comments that follow, so get a cuppa, sit down, brace yourself and let's begin :-

I might actually have started to think there is a value case for SSD drives!

Now, this is not in storage arrays - certainly not until array software comes along that can transparently and automatically use these, and when so called 'EFDs' become financially viable (rather than 20x cost of equivalent FC capacity). I'm expecting this to be mid 2010...

But, the value case I've now had first had usage & benefit of is in personal laptop computers.

So earlier this week I had the pleasure of my office IT support dept using HP Radia to roll-out office2003 patches on-top of my office2007 install. Thus rapidly borking my HP tablet PC to a level where it is still being reformatted and rebuilt as I type (3 days later).

Now the interesting bit is that they gave me a loan laptop - a Dell (yes I swore a lot), that despite having less RAM and slower CPU, this laptop appeared to operate in the real-world much faster.

A quick peek and I discovered the Dell was actually using a 64GB Samsung SSD, which if nothing else has gone to show what a poorly written app MS-Outlook2007 is - with the end-user client application being almost entirely IO bound.

The same dataset (mix of OST & PST of about 30GB all offline local) takes about 15mins to load on normal drive, about 1min on SSD. Naturally this is a rather specific use case being a single spindle, mix of multiple IO etc, large % of disk capacity utilised etc.

Even so this has changed my laptop usage feeling so much I'm off to buy my own SSDs (out of my own pocket) to put into my work laptop ad my home gaming rig.

So who has any recommendations for 128GB SSDs for HP 2710p Tablet (work) and normal SATA interface (home)? I've been noodling at these :-

http://www.yoyotech.co.uk/hard-drives-hard-drive-c-36_218.html

http://www.ebuyer.com/cat/Hard-Drives/subcat/Solid-State-Hard-Drives

Monday, 21 September 2009

TCO - Time for an opensource framework?

So in my job I regularly see what each vendor claims to be a 'TCO model' - now funnily enough these normally show that the vendor's widget is much better than the competitor's other widget. Naturally each model has some elements in it that the others don't or places a certain weighting / emphasis on particular attributes that others don't.

Now this is a topic that is very close to my heart, as all standards and strategy changes I make in my company are supposed to be TCO based - with us not making any changes unless they improve own actual TCO. Naturally this breaks when vendors EOL products or TCO isn't the driver - but the principle is valid (although sadly a surprise to many people).

Now on @om_nick's blog here http://www.matrixstore.net/2009/09/17/defining-an-up-to-date-tco-model/ he reminded me that I had a draft blog on this, and that 'crowd-sourcing' such models can work quite well. There's plenty of good attributes listed so far on Nick's blog and I'm sure we'll all add many more as time goes on (I know I must have a good dozen or so TCO Excel models knocking around somewhere).

Of course I know that TCO isn't always the right measure, and that ROI or IRR can often be just as valid, but for lots of elements of infrastructure the first point of call is a TCO or CBA - and making those consistent would be a great starting point!

One thing I'm very sure about is that for each technology category there is more than one 'level' to measure a TCO at for different purposes, for example :-

Industry average TCO - ie what does a GB of data cost to store for x hours on average in the industry? (the analyst KPI - and product / vendor agnostic)

Estate average TCO - ie what does a GB of data cost to store for x hours in my company on average? (the CTO level KPI - and product / vendor agnostic)

Architecture average TCO - ie for this type of reference design (inc Function & NonFunctional Requirements) what does a GB of data cost to store for x hours in my company on average? (the architect level KPI) This is product / vendor agnostic and used for ROM costing and selection of an infrastructure architecture.

Category average TCO - ie for this class of product (eg modular storage, enterprise storage, small x86, medium unix etc) what does a GB of data cost to store for x hours in my company on average? (the catalogue level KPI) This is now technology 'class' specific, but still product / vendor agnostic, and is used for building up the ROM & architecture costs above.

Product TCO - ie for this specific vendor product & version what does a GB of data cost to store for x hours in my company on average? (the product level KPI) This is now product and vendor specific, and is used for selecting product within a category (ie direct product bake-offs).

There are many tricky parts in a TCO model, including :-

What to measure? (both what is desired, and what is actually possible over time)
How to measure?
Where to measure?
How frequently to measure?
What relative weighting to give?
What TCO output KPIs to give? (eg € per GB, GB per kW, € per IOP etc)
How to communicate such KPIs without creating dangerous context-less sound-bites for people to abuse (ie my absolute hatred of the phrase 'utilisation' - it's utterly meaningless without context!)
How to ensure transparency & clarity over assumptions and driving inputs?
How to value / compare functionality when no direct equivalents?
How to handle 'currently familiar' or 'keep same' (ie low cost of introduction) Vs 'new vendor & widget' (ie disruption & short term duplication of costs / disruption etc)?
How to handle 'usefulness'? (eg performance is a NFR than has value - does 'IOPS per GB per €' work?)
How to build a feedback loop and refinement model to periodically measure and validate TCO predictions Vs actuals, and take action accordingly?
How to protect confidential or sensitive values?

Getting a common list of assumptions, factors and attributes, relative weightings and of course values for all of these is the absolute key and a very valuable exercise for all - customer and vendor alike.

Lastly - one company with a very interesting approach to TCO mngt is www.Apptio.com who provide a SaaS model for building & maintaining an automated TCO measurement and reporting platform - it's certainly sparked interest in my mind, would love to hear more about people's thoughts about or experiences with them.

Now I for one am more than up for spending time on creating an 'open source' TCO model that has many people's input and thoughts into it, that we can refine and revise over time and use to evaluate many vendor technologies - so what do other people think?

Tuesday, 1 September 2009

Cloud Backup & Android

Firstly a tip for Android users (and I'm a major fan of this platform for mobile devices) :-

If you change your Google account password, also make sure you change the password cached on your Android phone quickly afterwards - otherwise it appears that Google automatically decides that the failed login attempts from your phone (when it is auto syncing contacts, calender & email etc data) are a hacking attack and temporarily disabling your Google account... Clearly not good and needs some more thought from Google I think :(

Which leads me onto a related topic - cloud backup (which of course should really be cloud restore rather than cloud backup). In this context I'm really talking about the SaaS & PaaS definitions of cloud.

Firstly I should say that I'm greatly in favour of having multiple controlled & secured instances of data in several locations, my feeling is that for a lot of smaller organisations or individuals this simply doesn't occur. As such the technologies loosely referred to as 'cloud backup' could be invaluable to many people in easily & cost effectively enabling data persistence and recovery.

Now, before we dig into the real topic, there are lots of side-points to consider re security & availability in the cloud backup/recovery area, including :-

Obviously the data needs to be protected, normally with encryption. My view is that this should have private keys supplied & owned by the user, and not by the backup/recovery provider. With the private keys similarly backed-up to a separate key escrow / backup & recovery provider.
What SLAs does the provider have? (availability, accessibility, performance, integrity etc) and how, from where and how often are they measured & reported?
How can you contact your provider should you have an issue? (a web form simply doesn't cut it)
What guarantees do they offer to keep your data at their site available - are they a '2nd copy hoster' or do they treat your data with the same care as a master copy (eg do they do their own backups/replicas, can you treat their service as an archive rather than B&R etc?)
What are the guarantees worth? what kind of financial penalties / compensation are available, how are they calculated & triggered and how do they compare re the value of the data?
Is the provider somebody you'd trust with your banking details? As it's likely you'll either be giving them these, or all of the information behind them, in one form or another
Cloud economics often rely on some form of content dedupe at the provider's end, you need to satisfy yourself that supplier's dedupe won't impact your security or encryption

But with the above in-mind, back to the real topic - the three real points I was wondering about here are a little bit different :-

Should you backup your SaaS & PaaS cloud service data to your own local media (ie backing-up your part of the cloud)?

What happens to your data (your assets & value) when a service goes down, your account is deleted, the service is hacked, the company vanishes or... Can you backup your Google/Yahoo email to your local home NAS, can you backup your blog sites & social media pages to your local storage?

Irrespective of how it is done I'm increasingly of a belief that there is going to be a need for this. The first time it happens is often for some 'novelty' data which is irritating but little more, however as people rapidly move to cloud services that handle their data the risk & loss becomes higher...

Not saying stop using the SaaS services (different view re PaaS but that's another blog) as the prime system, but if the data is worth something (emotional, financial etc) then my view is that it should always be in two independent places, with one of those in the data owner's direct control.

So I'm wondering when the current generation of home NAS devices will start to include the ability to receive data from remote sites, or to have the ability to obtain that data automatically themselves?

Can your cloud backup/recovery partner also backup your social media and SaaS services?

Moving on from the previous point, what I'm thinking about here is that rather than need to use local media, could your cloud backup partner (assuming they are different to your other PaaS/SaaS providers) also provide 'content aware' backups for your other internet data services such as blog sites, Facebook & MySpace sites, Twitter tweets/favourites/friends/followers, webmail and other PaaS / SaaS services etc?

Could your cloud backup partner also move into providing a basic 'cloud DR' service?

It's a fairly simple step for a cloud partner to wrapper & automate the creation of an AWS EC2 image, load their backup/restore software onto that image and then allow the customer to restore their data 'as needed' to the EC2 image. Where in turn they can run the usual suite of common apps easily enough... Not earth changing but a simple enough value add that would provide transitory help for some situations...

Now I'm aware that some of the points above could be twisted into FUD, they certainly aren't intended as that (and I'll be more than grumpy if they do get used as FUD) - they are the questions I ask myself about my personal information storage (especially when an account gets disabled!).

In this topic (like many others) I certainly agree with some of the points that @StorageBod makes in his blog entry at http://storagebod.typepad.com/storagebods_blog/2009/08/information-haze.html re personal information being both of value and dispersed, with little current understanding from the public at large re the potential consequences...