欢迎投稿

今日深度:

HBase Concept,

HBase Concept,


- Data Model, sparse, distributed, persisted multidimensional sorted map

(row:string, column:string, time:int64) -> string //both key and value are uninterpreted bytes
  • Row 
  • Column Family, should be declared upfront. basic unit of access control. columns under the column family have same type. disk(compress and index) and memory statistics are based on column family
  • Timestamp (64-bit int), multiple versions of cell. values are ordered in timestamp decreasing order. values can be garbage collected by specifying last n versions or new-enough versions (e.g., values written in last several days)

- API, Get, Put, Scan, Delete

HBase doesn't modify data in place. delete is handled by putting tomstones on. These tomstones, along with dead values are cleaned up on major compaction

get rows by,


- Building block


  • region, regions split by row range comprise table. they can be distributed and load-balanced in different machines (region servers). 
  • HFile, pesisted, ordered immutable map from keys to values. Block indices is loaded to memory for looking up. then load blocks to memory. 


  • HLog, one region server one Hlog file. it's a sequence file (HLogKey -> KeyValue). 

HLogKey: table+region+sequence nbr+timestamp

KeyValue: from HFile

cons: need to split log and send splits to different region servers when recover from the region server failure.

pros: only append log to one log file avoiding seeking time to multiple files.


- Architecture view




  • client, access hbase via API, cache some info, like region location
  • zookeeper
  • Master
  • region server

- workflow

  • locate region (b+ tree of three-level structure

  • read/write, 
  • region assignment, done by Master who knows region servers, region's affliation and unassigned regions.
  • region server up/down, 
  • master up/down


reference: http://mvplee.iteye.com/blog/2247221, the bigtable paper

www.htsjk.Com true http://www.htsjk.com/hbase/39574.html NewsArticle HBase Concept, - Data Model, sparse, distributed, persisted multidimensional sorted map (row:string, column:string, time:int64) - string //both key and value are uninterpreted bytes Row  Column Family, should be declared upfront. basic u...
相关文章
    暂无相关文章
评论暂时关闭