Want to dive even deeper?

Take the course PHP and MySQLi Database Integration by Stone River and become an expert!
PHP and MySQLi Database Integration
by Stone River

Check it out!
You're watching a preview of this video, click the button on the left to puchase the full version from Devoxx'08.

My Life with HBase

HBase is an Open Source implementation of Google's BigTable architecture. Its goal is the hosting of very large tables - billions of rows, millions of columns - atop clusters of "commodity" hardware. This talk reports on findings along the way of setting up HBase clusters of various size and use.

Published on
  • 28.146
  • 144
  • 0
  • 7
  • 0
  • “My Life with HBase” Lars George, CTO of WorldLingo Apache Hadoop HBase Committer www.worldlingo.com www.larsgeorge.com
  • WorldLingo Co-founded 1999  Machine Translation Services  Professional Human Translations  Offices in US and UK  Microsoft Office Provider since 2001  Web based services  Customer Projects  Multilingual Archive 
  • Multilingual Archive SOAP API  Simple calls  ◦ ◦ ◦ ◦ ◦ ◦ putDocument() getDocument() search() command() putTransformation() getTransformation()
  • Multilingual Archive (cont.) Planned already, implemented as customer project  Scale:  ◦ 500million documents ◦ Random Access ◦ “100%” Uptime  Technologies? ◦ Database ◦ Zip-Archives on file system, or Hadoop
  • RDBMS Woes         Scaling MySQL hard, Oracle expensive (and hard) Machine cost goes up faster speed Turn off all relational features to scale Turn off secondary indexes too Tables can be a problem at sizes as low as 500GB Hard to read data quickly at these sizes Write speed degrades with table size Future growth uncertain
  • MySQL Limitations Master becomes a problem  What if your write speed is greater than a single machine  All slaves must have same write capacities as master (can‘t check out on slaves)  Single point of failure, no easy failover  Can (sort of) solve this with sharding 
  • Sharding
  • Sharding Problems Requires either a hashing function or mapping table to determine shard  Data access code becomes complex  What if shard sizes become too large? 
  • Resharding
  • Schema Changes What about schema changes or migrations?  MySQL not your friend here  Only gets harder with more data 
  • HBase to the Rescue Clustered, commodity(-ish) hardware  Mostly schema-less  Dynamic distribution  Spreads writes out over the cluster 
  • HBase  Distributed database modeled on Bigtable ◦ Bigtable: A Distributed Storage System for Structured Data by Chang et al. Runs on top of Hadoop Core  Layers on HDFS for storage  Native connections to MapReduce  Distributed, High Availability, High Performance, Strong Consistency 
  • HBase  Column-oriented store ◦ ◦ ◦ ◦ Wide table costs only the data stored NULLs in row are 'free' Good compression: columns of similar type Column name is arbitrary Rows stored in sorted order  Can random read and write  Goal of billions of rows X millions of cells  ◦ Petabytes of data across thousands of servers
  • Tables Table is split into roughly equal sized „regions“  Each region is a contiguous range of keys, from [start, to end)  Regions split as they grow, thus dynamically adjusting to your data set 
  • Tables (cont.) Tables are sorted by Row  Table schema defines column families  ◦ Families consist of any number of columns ◦ Columns consist of any number of versions ◦ Everything except table name is byte[] (Table, Row, Family:Column, Timestamp)  Value
  • Tables (cont.)  As a data structure SortedMap( RowKey, List( SortedMap( Column, List( Value, Timestamp ) ) ) )
  • Server Architecture  Similar to HDFS ◦ Master ≈ Namenode ◦ Regionserver ≈ Datanode Often run these alongsaide each other!  Difference: HBase stores state in HDFS  HDFS provides robust data storage across machines, insulating against failure  Master and Regionserver fairly stateless and machine independent 
  • Region Assignment Each region from every table is assigned to a Regionserver  Master Duties:  ◦ Reponsible for assignment and handling regionserver problems (if any!) ◦ When machines fail, move regions ◦ When regions split, move regions to balance ◦ Could move regions to respond to load ◦ Can run multiple backup masters
  • Master  The master does NOT ◦ ◦ ◦ ◦ Handle any write requests (not a DB master!) Handle location finding requests Not involved in the read/write path Generally does very little most of the time
  • Distributed Coordination Zookeeper is used to manage master election and server availability  Set up as a cluster, provides distributed coordination primitives  An excellent tool for building cluster management systems 
  • HBase Storage Architecture
  • HBase Public Timeline         November 2006 ◦ Google releases paper on Bigtable February 2007 October 2007 ◦ Initial HBase prototype created as Hadoop contrib ◦ First "useable" HBase (0.15.0 Hadoop) December 2007 ◦ First HBase User Group January 2008 ◦ Hadoop becomes TLP, HBase becomes subproject October 2008 ◦ HBase 0.18.1 released January 2009 ◦ HBase 0.19.0 released ◦ HBase 0.20.0 released September 2009
  • HBase WorldLingo Timeline
  • HBase - Example  Store web crawl data ◦ Table crawl with family content ◦ Row is URL with columns  content:data stores raw crawled data  content:language stores http language header  content:type stores http content-type header ◦ If processing raw data for hyperlinks and images, add families links and images  links:<url> column for each hyperlink  links:<url> column for each image
  • HBase - Clients  Native Java client/API ◦ get(Get get) ◦ put(Put put)  Non-Java clients ◦ Thrift server (Ruby, C++, Erlang, etc.) ◦ REST server (Stargate) TableInput/TableOutputFormat for MapReduce  HBase shell (jruby) 
  • Scaling HBase   Add more machines to scale ◦ Automatic rebalancing Base model (BigTable) scales past 1000TB  No inherent reason why Hbase couldn‘t
  • What to store in HBase Maybe not your raw log data...  ... but the results of processing it with Hadoop!  By storing the refined version in HBase, can keep up with huge data demands and serve to your website 
  • !HBase  “NoSQL” Database! ◦ ◦ ◦ ◦ ◦ No joins No sophisticated query engine No transactions (sort of) No column typing No SQL, no ODBC/JDBC, etc. (but there is HBql now!) Not a replacement for your RDBMS...  Matching Impedance! 
  • Why HBase? Datasets are reaching Petabytes  Traditional databases are expensive to scale and difficult to distribute  Commodity hardware is cheap and powerful (but HBase can make use of powerful machines too!)  Need for random access and batch processing (which Hadoop does not offer) 
  • Numbers Single reads are 1-10ms depending on disk seeks and caching  Scans can return hundreds of rows in dozens of ms  Serial read speeds 
  • Multilingual Archive (cont.) Planned already, implemented as customer project  Scale:  ◦ 500million documents ◦ Random Access ◦ “100%” Uptime  Technologies? ◦ Database ◦ Zip-Archives on file system, or Hadoop
  • Lucene Search Server 43 fields indexed  166GB size  Automated merging/warm-up/swap  Looking into scalable solution  ◦ ◦ ◦ ◦  Katta Hyper Estraier DLucene … Sorting?
  • Multilingual Archive (cont.) 5 Tables  Up to 5 column families  XML Schemas  Automated table schema updates  Standard options tweaked over time  ◦ Garbage Collection!  MemCached(b) layer
  • Layers Firewall Network LWS Web App Cache Data Director 1 Director n Apache 1 Apache n … Tomcat 1 Tomcat n Tomcat 1 Tomcat n MemCach ed 1 HBase MemCach ed n
  • Map/Reduce Backup/Restore  Index building  Cache filling  Mapping  Updates  Translation 
  • HBase - Problems  Early versions (before HBase 0.19.0!) ◦ Data loss ◦ Migration nightmares ◦ Slow performance   Current version ◦ Read HBase Wiki!!! Single point of failure (name node only!)
  • HBase - Notes    RTF M HBase Wiki, IRC Channel Personal Experience: (ine) ◦ ◦ ◦ ◦ ◦ ◦ ◦ Max. file handles (32k+) Hadoop xceiver limits (NIO?) Redundant meta data (on name node) RAM (4GB+) Deployment strategy Garbage collection (use CMS, G1?) Maybe not mix batch and interactive?
  • Graphing Use supplied Ganglia context or JMX bridge to enable Nagios and Cacti  JMXToolkit: swiss army knife for JMX enabled servers: http://github.com/ larsgeorge/jmxtoolkit 
  • HBase - Roadmap  HBase 0.20.x “Performance” ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ New Key Format – KeyValue New File Format – Hfile New Block Cache – Concurrent LRU New Query and Result API New Scanners Zookeeper Integration – No SPOF in HBase New REST Interface Contrib  Transactional Tables  Secondary Indexes  Stargate
  • HBase - Roadmap (cont.)  HBase 0.21.x “Advanced Concepts” ◦ ◦ ◦ ◦ ◦ Master Rewrite – More Zookeeper New RPC Protocol (Avro) Multi-DC Replication Intra Row Scanning Further optimizations on algorithms and data structures ◦ Discretionary Access Control ◦ Coprocessors
  • Questions? lars@worldlingo.com larsgeorge@apache.org lars@larsgeorge.com  Blog: www.larsgeorge.com  Twitter: larsgeorge  Email: