HBase Concept,
- Data Model, sparse, distributed, persisted multidimensional sorted map
(row:string, column:string, time:int64) -> string //both key and value are uninterpreted bytes
- Row
- Column Family, should be declared upfront. basic unit of access control. columns under the column family have same type. disk(compress and index) and memory statistics are based on column family
- Timestamp (64-bit int), multiple versions of cell. values are ordered in timestamp decreasing order. values can be garbage collected by specifying last n versions or new-enough versions (e.g., values written in last several days)
- API, Get, Put, Scan, Delete
HBase doesn't modify data in place. delete is handled by putting tomstones on. These tomstones, along with dead values are cleaned up on major compaction
get rows by,
- Building block
- region, regions split by row range comprise table. they can be distributed and load-balanced in different machines (region servers).
- HFile, pesisted, ordered immutable map from keys to values. Block indices is loaded to memory for looking up. then load blocks to memory.
- HLog, one region server one Hlog file. it's a sequence file (HLogKey -> KeyValue).
HLogKey: table+region+sequence nbr+timestamp
KeyValue: from HFile
cons: need to split log and send splits to different region servers when recover from the region server failure.
pros: only append log to one log file avoiding seeking time to multiple files.
- Architecture view
- client, access hbase via API, cache some info, like region location
- zookeeper
- Master
- region server
- workflow
- locate region (b+ tree of three-level structure
- read/write,
- region assignment, done by Master who knows region servers, region's affliation and unassigned regions.
- region server up/down,
- master up/down
reference: http://mvplee.iteye.com/blog/2247221, the bigtable paper
本站文章为和通数据库网友分享或者投稿,欢迎任何形式的转载,但请务必注明出处.
同时文章内容如有侵犯了您的权益,请联系QQ:970679559,我们会在尽快处理。