Comment from: Steve Duplessie [Visitor]
Steve DuplessieInteresting. I don't know much about them, but I like the model comparison. Red Hat makes PILES of dough by supporting their open source software - and gets the leverage of a free global development effort. Wanna know who else makes piles of dough off of that same Red Hat code? Oracle. Same reason - people trust Oracle to support the stack.

I would argue that the storage world will stay vibrant - as the Unix world has - The Unix world is dying every day for mainstream applications - really only Solaris remains and for how long? As people flock from Unix they go to either Microsoft (gasp) or to Red Hat (and to a lesser degree, Novell) but either way, they are leaving.

The same will eventually be true in storage. Heavy weight OS type functions embedded in a storage controller are the same thing as MPE in an HP PA-RISC system 20 years ago - bloated, hard to support, and have diminishing value to customers. Removing the voodoo and opening up these functions has a history of working, so I figure it's just a matter of time.

Only question in my mind is how long will it take?

Cheers
12/29/09 @ 16:52
Comment from: Tony Asaro [Member] Email
Steve - I think that we actually are saying the same thing - it is an issue of how long the status quo remains dominant. Remember that people have been predicting the demise of Unix for 10 years now. But actually IBM sold over $6 billion worth of Unix servers, Sun sold over $4 billion and HP sold over $4 billion in 2009 - so I think it is far from dead. Unix will be around for a very long time.

The same will be true for traditional storage - it will take years for people to completely make the shift. In that time - could a true open source storage system have a major impact on the market? I believe the answer is yes.

Regardless, I do think that GlusterFS may be the start of something that is exciting. But it is a long road with lots of cool milestones and challenges on the way. To take a page from your recent blog on why startups fail - http://tinyurl.com/ybnv6u4 - in addition to needing a solid product they need great marketing.



12/29/09 @ 18:35
Comment from: Eli Collins [Visitor]
Eli CollinsCorrection: Hadoop's distributed file system (HDFS) does not store data in a proprietary format. Files are stored in blocks as regular files and directories.

12/30/09 @ 14:37
Comment from: Max Cohen [Visitor]
Max CohenSome stories about HDFS from wikipedia

------------
A filesystem requires one unique server, the name node. This is a single point of failure for an HDFS installation. If the name node goes down, the filesystem is offline. When it comes back up, the name node must replay all outstanding operations. This replay process can take over half an hour for a big cluster.[10] The filesystem includes what is called a Secondary Namenode, which misleads some people into thinking that when the primary Namenode goes offline, the Secondary Namenode takes over. In fact, the Secondary Namenode regularly connects with the namenode and downloads a snapshot of the primary Namenode's directory information, which is then saved to a directory. This Secondary Namenode is used together with the edit log of the Primary Namenode to create an up-to-date directory structure.

Another limitation of HDFS is that it cannot be directly mounted by an existing operating system. Getting data into and out of the HDFS file system, an action that often needs to be performed before and after executing a job, can be inconvenient. A Filesystem in Userspace has been developed to address this problem, at least for Linux and some other Unix systems.
---------------

Now as you read this it is outrageous to have a distributed filesystem with a single point of failure and also more
ridiculously replaying the whole of other calls which it is claimed to be of 1hr. Now here is a funny question is the HDFS ever installed on a 1000clients? did they tried replaying calls from that? i wouldn't be surprised at the very first approach. Also you can't mount HDFS as a normal filesystem now that is even strange this is what i think Gluster folks tried to tell that its not even Posix Compliance now Yahoo! uses this just becoz they didn't have any solution so they built their applications around this with HTTP get, put requests and even strange to that it is mentioned that you would need a userspace filesystem access files from HDFS.

All in all Hadoop is far cry even from calling themselves as a filesystem. Lustre is far better compared to hadoop in many cases as it feels to be a filesystem per se. But again lustre has
same problems of single metadata concept. I am not sure why people cannot see that pointing fingers and writing code to handle meta data is just stupid as the backend filesystems have done this job amazingly over the years.

MogileFS came by some promise but their performance sucks and have several design considerations.



12/31/09 @ 15:11
Comment from: Anand Babu Periasamy [Visitor]
Anand Babu PeriasamyEli,
HDFS is a distributed object storage system with centralized meta data server. It is specifically designed for map-reduce framework and can only store large objects (64MB and above). For a general purpose storage, users are not willing to make changes to their applications to use HDFS APIs.

HDFS objects are stored as structured files on top of regular disk filesystems. You still need the meta-data to restore its objects. Data is stored in a format, proprietary to HDFS.

As your storage volumes grow from 10s of TBs to 100s of TBs, it becomes painful to recover from a crash. Filesystem check downtime can take from days to weeks. That is why, keeping the files and folders as is (similar to NFS), is very crucial to scalability.
12/31/09 @ 16:30
Comment from: Manu Gupta [Visitor] Email
Manu GuptaI understand that you have implemented High Availability solution using synchronous replication but it's not "automatic failover" where you perform automatic data consistency check as part of recovery process. Right?
01/04/10 @ 17:09
Comment from: Dan [Visitor]
DanIt seems if Gluster is going to support NFS and others they need to implement a Virtual IP solution as well. The idea of having NFS clients access a single management brick (as you can only assign one management console) - which means the client is going to a single IP. If that management brick goes down and the client is accessing the storage via NFS and not the native Gluster client, you are SOL.
It seems like a simple Virutal IP addition to the product would be nice.
10/31/10 @ 02:59

Leave a comment


Your email address will not be revealed on this site.
(Line breaks become <br />)
(For my next comment on this site)
(Allow users to contact me through a message form -- Your email will not be revealed!)
« IT Analysis for 2010Discussion with i365 Blogger »