欢迎投稿

今日深度:

分布式数据存储系统,数据存储

分布式数据存储系统,数据存储


这类系统以Google的BigTable(来源于Fay Chang等人的论文Bigtable: A Distributed Storage System for Structured Data)为代表,因为它不是开源的,所以产生了许多开源的版本,比如Hypertable(C++语言编写)和HBase(Java,基于Hadoop之上)。

 

因为不支持SQL操作,所以有时也被叫做NoSQL数据存储。

 

那什么是分布式存储系统?

先看看Bigtable的定义:

Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. 

http://labs.google.com/papers/bigtable.html

 

来看看HBase的定义:

HBase is an open-source, distributed, column-oriented store modeled after the Google paper, "Bigtable: A Distributed Storage System for Structured Data" by Chang et al.

 

和传统的关系数据库的区别

HBase的领导人——Kellerman、Michael Stack和Byran Duxbury如下总结道:

HBase项目是为那些Oracle年许可费够得上一个小国家的国民生产总值(GNP)或由于其库表中有一些BLOB列且 行数达到了数百万级因而导致MySQL濒临崩溃的用户提供的。任何拥有大量的结构化或半结构化数据、而且正受限于关系数据库管理系统(RDBMS)的用户 都可以看看HBase。

 

和一些分布式缓存的区别

一些分布式缓存系统,比如Tangosol Coherence, GemFire,JBoss Cache和MemCached等,同样也可以做到分布式,可扩展性。stackoverflow上有个家伙总结得挺好(以Hypertable和Memcached比较为例):

Hypertable is an implementation of concepts in Google's Bigtable. Namely a column-oriented DB which has properties of being highly denormalized which means it doesn't need joins.

 

Memcached is an in-memory caching layer which acts like a distributed hashtable, keeping you app from having to hit the actual DB.

 

Both lend themselves well to being distributed and work well with MapReduce style topologies but they server different purposes. Memocached/DHT is going to serve to speed access to data in memory while HyperTable/Bigtable are actual mechanisms for permanent data storage on disk.

 

背景知识:

MapReduce, functional programming, Map, Reduce

HBase的领导人探讨Hadoop、BigTable和分布式数据库

http://www.infoq.com/cn/news/2008/05/hbase-interview

 

A Compendium of solutions for scaling a Data Store

http://bhavin.directi.com/tag/cassandra/

 

Writing Scalable Software in Java

http://www.slideshare.net/rbadaro/writing-scalable-software-in-java

 

YunTable-云时代的BigTable

http://www.tektalk.org/2010/10/09/yuntable-%E4%BA%91%E6%97%B6%E4%BB%A3%E7%9A%84bigtable/

 

 

 

www.htsjk.Com true http://www.htsjk.com/cassandra/34914.html NewsArticle 分布式数据存储系统,数据存储 这类系统以Google的BigTable(来源于Fay Chang等人的论文Bigtable: A Distributed Storage System for Structured Data)为代表,因为它不是开源的,所以产生了许多开源的...
相关文章
    暂无相关文章
评论暂时关闭