HBase条件查询，

和通数据库htsjk.Com2019-11-19 23:23 来源:未知阅读:13579 评论 217 热度2

标签：

最近用Hbase做了一些事情，这里记录一下使用的方法，但是没有经过性能测试。。。

HBase只提供了行级索引，因此，要进行条件查询只有两种方式：

（1）.设计合适的行键（通过行键直接定位到数据所在的位置）；

（2）.通过Scan方式进行查询，Scan可设置其实行和结束行，把这个搜索限定在一个区域中进行；

Scan可以设置一个或多个Filter，来对行键、列族和列进行过滤，从而达到条件查询的目的。

这里记录了一下使用第二种方式进行条件查询的代码。

1.RowPrefixFilter

这个filter可根据行键的前缀进行查询，即若行键满足我们设置的前缀，则会被检索出来。Hbase存储的数据是根据行键排序的，因此，根据行键的前缀来查询应该性能上会好一点，因为在Scan的源码中看到如下这一段，它把Scan的查询限制在了一定范围。

[java] view plain copy

/**
* Set a filter (using stopRow and startRow) so the result set only contains rows where the
* rowKey starts with the specified prefix.
* This is a utility method that converts the desired rowPrefix into the appropriate values
* for the startRow and stopRow to achieve the desired result.
* This can safely be used in combination with setFilter.
* NOTE: Doing a {@link #setStartRow(byte[])} and/or {@link #setStopRow(byte[])}
* after this method will yield undefined results.
* @param rowPrefix the prefix all rows must start with. (Set null to remove the filter.)
* @return this
*/
public Scan setRowPrefixFilter(byte[] rowPrefix) {
if (rowPrefix == null) {
setStartRow(HConstants.EMPTY_START_ROW);
setStopRow(HConstants.EMPTY_END_ROW);
} else {
this.setStartRow(rowPrefix);
this.setStopRow(calculateTheClosestNextRowKeyForPrefix(rowPrefix));
}
return this;
}

调用的方式也特别简单，如下：

[java] view plain copy

2.PageFilter

这个Filter可以帮助我们来对查询结果进行分页，每次返回一定量的结果。它的调用方式如下：

[java] view plain copy

可以在每次查询时缓存下最后一行，然后下一次查询的startRow从这一行开始，以达到分页的效果。

3.SingleColumnValueFilter

这个filter可以根据要求来筛选family:qualifier对应的value值，调用方式如下：

[java] view plain copy

filter = new SingleColumnValueFilter(Bytes.toBytes(fam), Bytes.toBytes(qualifier),
CompareFilter.CompareOp.EQUAL, Bytes.toBytes(value));

使用这个Filter感觉上比较慢，但是没有具体测过，因为HBase只对Rowkey进行索引，所以猜想过去这个filter应该是对全表进行了一次查询吧，所以速度比较慢。

4.RowFilter

这个Filter是对行键进行过滤的，也可以通过它来筛选出行键中包含指定字符串的一系列行，这个功能的调用方式如下：

[java] view plain copy

filter = new RowFilter(CompareFilter.CompareOp.EQUAL, new SubstringComparator(item.getValue()));

上面代码的item.getValue()即是行键中要包含的字符串。

5.FilterList

当有很多filter需要一起使用时，可以使用FilterList，可将许多Filter组成的ArrayList传给它的构造函数，来实例化这样一个对象，使用方法如下：

[java] view plain copy

FilterList.Operator.MUST_PASS_ALL，相当于每个filter是与的关系；

FilterList.Operator.MUST_PASS_ONE，相当于是或的关系；