Want to dive even deeper?

Take the course Patterns 4 - Singleton, Factory Method, Abstract Factory by Heinz Kabutz and become an expert!
Patterns 4 - Singleton, Factory Method, Abstract Factory
by Heinz Kabutz

Check it out!
You're watching a preview of this video, click the button on the left to puchase the full version from Devoxx'09.

% Completed!


About

My Life with HBase

HBase is an Open Source implementation of Google's BigTable architecture. Its goal is the hosting of very large tables - billions of rows, millions of columns - atop clusters of "commodity" hardware. This talk reports on findings along the way of setting up HBase clusters of various size and use.

Published on
  • 28.031
  • 144
  • 0
  • 7
  • 0
  • “My Life with HBase” Lars George, CTO of WorldLingo Apache Hadoop HBase Committer www.worldlingo.com www.larsgeorge.com
  • WorldLingo Co-founded 1999  Machine Translation Services  Professional Human Translations  Offices in US and UK  Microsoft Office Provider since 2001  Web based services  Customer Projects  Multilingual Archive 
  • Multilingual Archive SOAP API  Simple calls  ◦ ◦ ◦ ◦ ◦ ◦ putDocument() getDocument() search() command() putTransformation() getTransformation()
  • Multilingual Archive (cont.) Planned already, implemented as customer project  Scale:  ◦ 500million documents ◦ Random Access ◦ “100%” Uptime  Technologies? ◦ Database ◦ Zip-Archives on file system, or Hadoop
  • RDBMS Woes         Scaling MySQL hard, Oracle expensive (and hard) Machine cost goes up faster speed Turn off all relational features to scale Turn off secondary indexes too Tables can be a problem at sizes as low as 500GB Hard to read data quickly at these sizes Write speed degrades with table size Future growth uncertain
  • MySQL Limitations Master becomes a problem  What if your write speed is greater than a single machine  All slaves must have same write capacities as master (can‘t check out on slaves)  Single point of failure, no easy failover  Can (sort of) solve this with sharding 
  • Sharding
  • Sharding Problems Requires either a hashing function or mapping table to determine shard  Data access code becomes complex  What if shard sizes become too large? 
  • Resharding
  • Schema Changes What about schema changes or migrations?  MySQL not your friend here  Only gets harder with more data 
  • HBase to the Rescue Clustered, commodity(-ish) hardware  Mostly schema-less  Dynamic distribution  Spreads writes out over the cluster 
  • HBase  Distributed database modeled on Bigtable ◦ Bigtable: A Distributed Storage System for Structured Data by Chang et al. Runs on top of Hadoop Core  Layers on HDFS for storage  Native connections to MapReduce  Distributed, High Availability, High Performance, Strong Consistency 
  • HBase  Column-oriented store ◦ ◦ ◦ ◦ Wide table costs only the data stored NULLs in row are 'free' Good compression: columns of similar type Column name is arbitrary Rows stored in sorted order  Can random read and write  Goal of billions of rows X millions of cells  ◦ Petabytes of data across thousands of servers
  • Tables Table is split into roughly equal sized „regions“  Each region is a contiguous range of keys, from [start, to end)  Regions split as they grow, thus dynamically adjusting to your data set 
  • Tables (cont.) Tables are sorted by Row  Table schema defines column families  ◦ Families consist of any number of columns ◦ Columns consist of any number of versions ◦ Everything except table name is byte[] (Table, Row, Family:Column, Timestamp)  Value
  • Tables (cont.)  As a data structure SortedMap( RowKey, List( SortedMap( Column, List( Value, Timestamp ) ) ) )
  • Server Architecture  Similar to HDFS ◦ Master ≈ Namenode ◦ Regionserver ≈ Datanode Often run these alongsaide each other!  Difference: HBase stores state in HDFS  HDFS provides robust data storage across machines, insulating against failure  Master and Regionserver fairly stateless and machine independent 
  • Region Assignment Each region from every table is assigned to a Regionserver  Master Duties:  ◦ Reponsible for assignment and handling regionserver problems (if any!) ◦ When machines fail, move regions ◦ When regions split, move regions to balance ◦ Could move regions to respond to load ◦ Can run multiple backup masters
  • Master  The master does NOT ◦ ◦ ◦ ◦ Handle any write requests (not a DB master!) Handle location finding requests Not involved in the read/write path Generally does very little most of the time
  • Distributed Coordination Zookeeper is used to manage master election and server availability  Set up as a cluster, provides distributed coordination primitives  An excellent tool for building cluster management systems 
  • HBase Storage Architecture
  • HBase Public Timeline         November 2006 ◦ Google releases paper on Bigtable February 2007 October 2007 ◦ Initial HBase prototype created as Hadoop contrib ◦ First "useable" HBase (0.15.0 Hadoop) December 2007 ◦ First HBase User Group January 2008 ◦ Hadoop becomes TLP, HBase becomes subproject October 2008 ◦ HBase 0.18.1 released January 2009 ◦ HBase 0.19.0 released ◦ HBase 0.20.0 released September 2009
  • HBase WorldLingo Timeline
  • HBase - Example  Store web crawl data ◦ Table crawl with family content ◦ Row is URL with columns  content:data stores raw crawled data  content:language stores http language header  content:type stores http content-type header ◦ If processing raw data for hyperlinks and images, add families links and images  links:<url> column for each hyperlink  links:<url> column for each image
  • HBase - Clients  Native Java client/API ◦ get(Get get) ◦ put(Put put)  Non-Java clients ◦ Thrift server (Ruby, C++, Erlang, etc.) ◦ REST server (Stargate) TableInput/TableOutputFormat for MapReduce  HBase shell (jruby) 
  • Scaling HBase   Add more machines to scale ◦ Automatic rebalancing Base model (BigTable) scales past 1000TB  No inherent reason why Hbase couldn‘t
  • What to store in HBase Maybe not your raw log data...  ... but the results of processing it with Hadoop!  By storing the refined version in HBase, can keep up with huge data demands and serve to your website 
  • !HBase  “NoSQL” Database! ◦ ◦ ◦ ◦ ◦ No joins No sophisticated query engine No transactions (sort of) No column typing No SQL, no ODBC/JDBC, etc. (but there is HBql now!) Not a replacement for your RDBMS...  Matching Impedance! 
  • Why HBase? Datasets are reaching Petabytes  Traditional databases are expensive to scale and difficult to distribute  Commodity hardware is cheap and powerful (but HBase can make use of powerful machines too!)  Need for random access and batch processing (which Hadoop does not offer) 
  • Numbers Single reads are 1-10ms depending on disk seeks and caching  Scans can return hundreds of rows in dozens of ms  Serial read speeds 
  • Multilingual Archive (cont.) Planned already, implemented as customer project  Scale:  ◦ 500million documents ◦ Random Access ◦ “100%” Uptime  Technologies? ◦ Database ◦ Zip-Archives on file system, or Hadoop
  • Lucene Search Server 43 fields indexed  166GB size  Automated merging/warm-up/swap  Looking into scalable solution  ◦ ◦ ◦ ◦  Katta Hyper Estraier DLucene … Sorting?
  • Multilingual Archive (cont.) 5 Tables  Up to 5 column families  XML Schemas  Automated table schema updates  Standard options tweaked over time  ◦ Garbage Collection!  MemCached(b) layer
  • Layers Firewall Network LWS Web App Cache Data Director 1 Director n Apache 1 Apache n … Tomcat 1 Tomcat n Tomcat 1 Tomcat n MemCach ed 1 HBase MemCach ed n
  • Map/Reduce Backup/Restore  Index building  Cache filling  Mapping  Updates  Translation 
  • HBase - Problems  Early versions (before HBase 0.19.0!) ◦ Data loss ◦ Migration nightmares ◦ Slow performance   Current version ◦ Read HBase Wiki!!! Single point of failure (name node only!)
  • HBase - Notes    RTF M HBase Wiki, IRC Channel Personal Experience: (ine) ◦ ◦ ◦ ◦ ◦ ◦ ◦ Max. file handles (32k+) Hadoop xceiver limits (NIO?) Redundant meta data (on name node) RAM (4GB+) Deployment strategy Garbage collection (use CMS, G1?) Maybe not mix batch and interactive?
  • Graphing Use supplied Ganglia context or JMX bridge to enable Nagios and Cacti  JMXToolkit: swiss army knife for JMX enabled servers: http://github.com/ larsgeorge/jmxtoolkit 
  • HBase - Roadmap  HBase 0.20.x “Performance” ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ New Key Format – KeyValue New File Format – Hfile New Block Cache – Concurrent LRU New Query and Result API New Scanners Zookeeper Integration – No SPOF in HBase New REST Interface Contrib  Transactional Tables  Secondary Indexes  Stargate
  • HBase - Roadmap (cont.)  HBase 0.21.x “Advanced Concepts” ◦ ◦ ◦ ◦ ◦ Master Rewrite – More Zookeeper New RPC Protocol (Avro) Multi-DC Replication Intra Row Scanning Further optimizations on algorithms and data structures ◦ Discretionary Access Control ◦ Coprocessors
  • Questions? lars@worldlingo.com larsgeorge@apache.org lars@larsgeorge.com  Blog: www.larsgeorge.com  Twitter: larsgeorge  Email:

Comments

Be the first one to add a comment