HBase, the well known non-relational distributed database, comes with a console program to perform various operations on a HBase cluster. I’ve personally found this tool to be a bit limited and I’ve toyed around the idea of writing my own. Since HBase only comes with a Java driver for direct access and the various RPC interfaces such as Thrift don’t offer the full set of functions over HBase, I decided to go for Jython and to directly use the Java API. This article will show a mock-up of such a tool.
The idea is to provide a simple Python API over the HBase one and couple it with a Python interpreter. This means, it offers the possibility to perform any Python (well Jython) operations whilst operating on HBase itself with an easier API than the Java one.
Note also that the tool uses the WSPBus already described in an earlier article to control the process itself. You will therefore need CherryPy’s latest revision.
# -*- coding: utf-8 -*- import sys import os import code import readline import rlcompleter from org.apache.hadoop.hbase import HBaseConfiguration, \ HTableDescriptor, HColumnDescriptor from org.apache.hadoop.hbase.client import HBaseAdmin, \ HTable, Put, Get, Scan import logging from logging import handlers from cherrypy.process import wspbus from cherrypy.process import plugins class StaveBus(wspbus.Bus): def __init__(self): wspbus.Bus.__init__(self) self.open_logger() self.subscribe("log", self._log) sig = plugins.SignalHandler(self) if sys.platform[:4] == 'java': del sig.handlers['SIGUSR1'] sig.handlers['SIGUSR2'] = self.graceful self.log("SIGUSR1 cannot be set on the JVM platform. Using SIGUSR2 instead.") # See http://bugs.jython.org/issue1313 sig.handlers['SIGINT'] = self._jython_handle_SIGINT sig.subscribe() def exit(self): wspbus.Bus.exit(self) self.close_logger() def open_logger(self, name=""): logger = logging.getLogger(name) logger.setLevel(logging.INFO) h = logging.StreamHandler(sys.stdout) h.setLevel(logging.INFO) h.setFormatter(logging.Formatter("[%(asctime)s] %(name)s - %(levelname)s - %(message)s")) logger.addHandler(h) self.logger = logger def close_logger(self): for handler in self.logger.handlers: handler.flush() handler.close() def _log(self, msg="", level=logging.INFO): self.logger.log(level, msg) def _jython_handle_SIGINT(self, signum=None, frame=None): # See http://bugs.jython.org/issue1313 self.log('Keyboard Interrupt: shutting down bus') self.exit() class HbaseConsolePlugin(plugins.SimplePlugin): def __init__(self, bus): plugins.SimplePlugin.__init__(self, bus) self.console = HbaseConsole() def start(self): self.console.setup() self.console.run() class HbaseConsole(object): def __init__(self): # we provide this instance to the underlying interpreter # as the interface to operate on HBase self.namespace = {'c': HbaseCommand()} def setup(self): readline.set_completer(rlcompleter.Completer(self.namespace).complete) readline.parse_and_bind("tab:complete") import user def run(self): code.interact(local=self.namespace) class HbaseCommand(object): def __init__(self, conf=None, admin=None): self.conf = conf if not conf: self.conf = HBaseConfiguration() self.admin = admin if not admin: self.admin = HBaseAdmin(self.conf) def table(self, name): return HTableCommand(name, self.conf, self.admin) def list_tables(self): return self.admin.listTables().tolist() class HTableCommand(object): def __init__(self, name, conf, admin): self.conf = conf self.admin = admin self.name = name self._table = None def row(self, name): if not self._table: self._table = HTable(self.conf, self.name) return HRowCommand(self._table, name) def create(self, families=None): desc = HTableDescriptor(self.name) if families: for family in families: desc.addFamily(HColumnDescriptor(family)) self.admin.createTable(desc) self._table = HTable(self.conf, self.name) return self._table def scan(self, start_row=None, end_row=None, filter=None): if not self._table: self._table = HTable(self.conf, self.name) sc = None if start_row and filter: sc = Scan(start_row, filter) elif start_row and end_row: sc = Scan(start_row, end_row) elif start_row: sc = Scan(start_row) else: sc = Scan() s = self._table.getScanner(sc) while True: r = s.next() if r is None: raise StopIteration() yield r def delete(self): self.disable() self.admin.deleteTable(self.name) def disable(self): self.admin.disableTable(self.name) def enable(self): self.admin.enableTable(self.name) def exists(self): return self.admin.tableExists(self.name) def list_families(self): desc = HTableDescriptor(self.name) return desc.getColumnFamilies() class HRowCommand(object): def __init__(self, table, rowname): self.table = table self.rowname = rowname def put(self, family, column, value): p = Put(self.rowname) p.add(family, column, value) self.table.put(p) def get(self, family, column): r = self.table.get(Get(self.rowname)) v = r.getValue(family, column) if v is not None: return v.tostring() if __name__ == '__main__': bus = StaveBus() HbaseConsolePlugin(bus).subscribe() bus.start() bus.block() |
To test the tool, you can simply grab the latest copy of HBase and run:
hbase-0.20.4$ ./bin/start-hbase.sh |
Then you need to configure your classpath so that it includes all the HBase dependencies. To determine them:
$ ps auwx|grep java|grep org.apache.hadoop.hbase.master.HMaster|perl -pi -e "s/.*classpath //" |
Copy the full list of jars and export CLASSPATH with it. (This is from the HBase wiki on Jython and HBase).
Next you have to add an extra jar to the classpath so that Jython supports readline:
$ export CLASSPATH=$CLASSPATH:$HOME/jython2.5.1/extlibs/libreadline-java-0.8.jar |
Make sure you’ll install libreadline-java as well.
Now, that your environment is setup, save the code above under a script named stave.py and run it as follow:
$ jython stave.py Python 2.5.1 (Release_2_5_1:6813, Sep 26 2009, 13:47:54) [Java HotSpot(TM) Server VM (Sun Microsystems Inc.)] on java1.6.0_20 Type "help", "copyright", "credits" or "license" for more information. (InteractiveConsole) >>> c.table('myTable').create(families=['aFamily:']) >>> c.table('myTable').list_families() array(org.apache.hadoop.hbase.HColumnDescriptor) >>> c.table('myTable').row('aRow').put('aFamily', 'aColumn', 'hello world!') >>> c.table('myTable').row('aRow').get('aFamily', 'aColumn') 'hello world!' >>> list(c.table('myTable').scan()) [keyvalues={aRow/aFamily:aColumn/1277645421824/Put/vlen=12}] |
You can import any Python module available to your Jython environment as well of course.
I will probably extend this tool over time but in the meantime I hope you’ll find it a useful canvas to operate HBase.