【甘道夫】HBase连接池 -- HTablePool是Deprecated之后

it2025-07-10 32

说明：近期两天在调研HBase的连接池，有了一些收获，特此记录下来。

本文先将官方文档（http://hbase.apache.org/book.html）9.3.1.1节翻译，方便大家阅读，然后查阅了关键类HConnectionManager的Developer API（ http://hbase.apache.org/devapidocs/index.html）做了一些总结。最后介绍一些阅读0.96、0.98及最新源代码的精彩发现。

欢迎转载。请注明来源： http://blog.csdn.net/u010967382/article/details/38046821

1.连接 HTable是HBase的client，负责从meta表中找到目标数据所在的RegionServers，当定位到目标RegionServers后，client直接和RegionServers交互，而不比再经过master。 HTable实例并非线程安全的。当须要创建HTable实例时，明智的做法是使用同样的HBaseConfiguration实例，这使得共享连接到RegionServers的ZK和socket实例，比如，应该使用这种代码： HBaseConfiguration conf = HBaseConfiguration.create(); HTable table1 = new HTable(conf, "myTable"); HTable table2 = new HTable(conf, "myTable"); 而不是这种代码： HBaseConfiguration conf1 = HBaseConfiguration.create(); HTable table1 = new HTable(conf1, "myTable"); HBaseConfiguration conf2 = HBaseConfiguration.create(); HTable table2 = new HTable(conf2, "myTable");

2.连接池当面对多线程訪问需求时。我们能够预先建立HConnection，參见下面代码：

Example 9.1. Pre-Creating a HConnection

注意： HTablePool是HBase连接池的老使用方法。该类在0.94，0.95和0.96中已经不建议使用。在0.98.1版本号以后已经移除。

BTW：简陋的官方文档到此为止。。。。Orz

3.HConnectionManager 该类是连接池的关键。专门介绍。 HConnectionManager是一个不可实例化的类。专门用于创建HConnection。最简单的创建HConnection实例的方式是 HConnectionManager.createConnection(config)。该方法创建了一个连接到集群的HConnection实例，该实例被创建的程序管理。

通过这个HConnection实例，能够使用

HConnection.getTable(byte[])方法取得 HTableInterface implementations的实现，比如 : HConnection connection = HConnectionManager . createConnection ( config ); HTableInterface table = connection.getTable("tablename"); try { // Use the table as needed, for a single operation and a single thread } finally { table.close(); connection.close(); } 3.1构造函数无。不可实例化。 3.2经常用法（1）static HConnection createConnection(org.apache.hadoop.conf.Configuration conf) 创建一个新的HConnection实例。该方法绕过了常规的HConnection生命周期管理，常规是通过 getConnection(Configuration)来获取连接。

调用方负责运行

Closeable.close() 来关闭获得的连接实例。

推荐的创建HConnection的方法是： HConnection connection = HConnectionManager.createConnection(conf); HTableInterface table = connection.getTable("mytable"); table.get(...); ... table.close(); connection.close(); （2）public static HConnection getConnection(org.apache.hadoop.conf.Configuration conf) 依据conf获取连接实例。

假设没有相应的连接实例存在，该方法创建一个新的连接。

注意：该方法在0.96和0.98版本号中都被Deprecated了，不建议使用。可是在最新的未公布代码版本号中又复活了。！

！

3.3实例代码 package fulong.bigdata.hbase; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.Cell; import org.apache.hadoop.hbase.CellUtil; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.HConnection; import org.apache.hadoop.hbase.client.HConnectionManager; import org.apache.hadoop.hbase.client.HTableInterface; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.client.ResultScanner; import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.util.Bytes; public class ConnectionPoolTest { private static final String QUORUM = "FBI001,FBI002,FBI003"; private static final String CLIENTPORT = "2181"; private static final String TABLENAME = "rd_ns:itable"; private static Configuration conf = null; private static HConnection conn = null; static{ try { conf = HBaseConfiguration.create(); conf.set("hbase.zookeeper.quorum", QUORUM); conf.set("hbase.zookeeper.property.clientPort", CLIENTPORT); conn = HConnectionManager.createConnection(conf); } catch (IOException e) { e.printStackTrace(); } } public static void main(String[] args) throws IOException { HTableInterface htable = ConnectionPoolTest.conn.getTable(TABLENAME); try { Scan scan = new Scan(); ResultScanner rs = htable.getScanner(scan); for (Result r : rs.next(5)) { for (Cell cell : r.rawCells()) { System.out.println("Rowkey : " + Bytes.toString(r.getRow()) + " Familiy:Quilifier : " + Bytes.toString(CellUtil.cloneQualifier(cell)) + " Value : " + Bytes.toString(CellUtil.cloneValue(cell)) + " Time : " + cell.getTimestamp()); } } } finally { htable.close(); } } }

4.阅读源代码的新发现 4.1消失的HConnectionManager.getConnection 从0.96和0.98版本号HConnectionManager的源代码中能够看到 static final Map<HConnectionKey, HConnectionImplementation> CONNECTION_INSTANCES; 就是连接池，连接池中的每一个连接用 HConnectionKey来标识，然而， HConnectionManager 源代码中全部涉及 CONNECTION_INSTANCES 的方法全都被Deprcated了。我们来看已经被Deprecated的getConnection方法： /** * Get the connection that goes with the passed <code>conf</code> configuration instance. * If no current connection exists, method creates a new connection and keys it using * connection-specific properties from the passed {@link Configuration}; see * {@link HConnectionKey}. * @param conf configuration * @return HConnection object for <code>conf</code> * @throws ZooKeeperConnectionException */ @Deprecated public static HConnection getConnection(final Configuration conf) throws IOException { HConnectionKey connectionKey = new HConnectionKey(conf); synchronized (CONNECTION_INSTANCES) { HConnectionImplementation connection = CONNECTION_INSTANCES.get(connectionKey); if (connection == null) { connection = (HConnectionImplementation)createConnection(conf, true); CONNECTION_INSTANCES.put(connectionKey, connection); } else if (connection.isClosed()) { HConnectionManager.deleteConnection(connectionKey, true); connection = (HConnectionImplementation)createConnection(conf, true); CONNECTION_INSTANCES.put(connectionKey, connection); } connection.incCount(); return connection; } } 该方法逻辑非常easy：依据传入的conf构建 HConnectionKey，然后以 HConnectionKey 实例为key到连接池Map对象 CONNECTION_INSTANCES 中去查找connection，假设找到就返回connection，假设找不到就新建，假设找到但已被关闭。就删除再新建。我们来看 HConnectionKey的构造函数： HConnectionKey(Configuration conf) { Map<String, String> m = new HashMap<String, String>(); if (conf != null) { for (String property : CONNECTION_PROPERTIES) { String value = conf.get(property); if (value != null) { m.put(property, value); } } } this.properties = Collections.unmodifiableMap(m); try { UserProvider provider = UserProvider.instantiate(conf); User currentUser = provider.getCurrent(); if (currentUser != null) { username = currentUser.getName(); } } catch (IOException ioe) { HConnectionManager.LOG.warn("Error obtaining current user, skipping username in HConnectionKey", ioe); } } 由以上源代码可知，接收conf构造 HConnectionKey 实例时，事实上是将conf配置文件里的属性赋值给 HConnectionKey 自身的属性，换句话说。无论你new几次，仅仅要conf的属性同样，new出来的 HConnectionKey 实例的属性都同样。结论一：conf的属性 --》 HConnectionKey实例的属性接下来，回到getConnection源代码中看到这样一句话： HConnectionImplementation connection = CONNECTION_INSTANCES . get(connectionKey) ; 该代码是以 HConnectionKey 实例为key来查找 CONNECTION_INSTANCES 这个 LinkedHashMap 中是否已经包括了 HConnectionKey 实例为key的键值对，这里要注意的是，map的get方法，事实上获取的是key的hashcode。这个自己读JDK源代码就能看到。然而 HConnectionKey 已经重载了hashcode方法： @Override public int hashCode() { final int prime = 31; int result = 1; if (username != null) { result = username.hashCode(); } for (String property : CONNECTION_PROPERTIES) { String value = properties.get(property); if (value != null) { result = prime * result + value.hashCode(); } } return result; } 在该代码中。终于返回的hashcode取决于当前username及当前conf配置文件的属性。所以，仅仅要conf配置文件的属性和用户同样。 HConnectionKey 实例的hashcode就同样！结论二：conf的属性 --》HConnectionKey实例的hashcode 再来看刚才这句代码： HConnectionImplementation connection = CONNECTION_INSTANCES . get(connectionKey) ; 对于get方法的參数connectionKey，无论connectionKey是不是同一个对象，仅仅要connectionKey的属性同样，那connectionKey的hasecode就同样，对于get方法而言，也就是同样的key。。！所以，能够得出结论三： conf的属性 --》HConnectionKey实例的hashcode --》 get返回的connection实例结论三换句话说说： conf的属性同样 --》 CONNECTION_INSTANCES.get返回同一个connection实例然而。如果我们的HBase集群仅仅有一个。那我们的HBase集群的conf配置文件也就仅仅有一个（固定的一组属性）。除非你有多个HBase集群另当别论。在这样一个机制下，假设仅仅有一个conf配置文件，则连接池中永远仅仅会有一个connection实例！

那“池”的意义就不大了。

所以，代码中才将基于 getConnection 获取池中物的机制 Deprecated了，转而在官方文档中建议： ******************************************************************************************************************* 当面对多线程訪问需求时，我们能够预先建立HConnection，參见下面代码：

Example 9.1. Pre-Creating a HConnection

// Create a connection to the cluster. HConnection connection = HConnectionManager.createConnection(Configuration); HTableInterface table = connection.getTable("myTable"); // use table as needed, the table returned is lightweight table.close(); // use the connection for other access to the cluster connection.close(); 构建HTableInterface实现是很轻量级的，而且资源是可控的。 ******************************************************************************************************************* （以上又一次拷贝了一次官方文档的翻译）假设大家依照官方文档的建议做了，也就是预先创建了一个连接，以后的訪问都共享该连接，这种效果事实上和过去的 getConnection 全然一样。都是在玩一个connection实例！

4.2 HBase的新时代我查看了Git上最新版本号的代码（https://git-wip-us.apache.org/repos/asf?

p=hbase.git;a=tree），发现getConnection复活了：

/** * Get the connection that goes with the passed <code>conf</code> configuration instance. * If no current connection exists, method creates a new connection and keys it using * connection-specific properties from the passed {@link Configuration}; see * {@link HConnectionKey}. * @param conf configuration * @return HConnection object for <code>conf</code> * @throws ZooKeeperConnectionException */ public static HConnection getConnection(final Configuration conf) throws IOException { return ConnectionManager.getConnectionInternal(conf); } 这个不是重点，重点是最新版本号代码的pom： 39 <groupId>org.apache.hbase</groupId> 40 <artifactId>hbase</artifactId> 41 <packaging>pom</packaging> 42 <version> 2.0.0-SNAPSHOT</version> 43 <name>HBase</name> 44 <description> 45 Apache HBase \99 is the Hadoop database. Use it when you need 46 random, realtime read/write access to your Big Data. 47 This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters 48 of commodity hardware. 49 </description> HBase即将迎来2.0.0版本号！！ HBase的下一个公布版是否会像Hadoop2.0那样来一个华丽丽的升华。迎来众多牛逼的新特性呢？从 CHANGES.txt 文档中没法得到最新的信息。最后一次更新还在2012年2月24日，看来开源大佬们也是爱编码不爱写文档的主。

。

转载于:https://www.cnblogs.com/bhlsheji/p/4889493.html

相关资源：数据结构—成绩单生成器

最新回复(0)