Introducing Omid - Transaction Processing for Apache HBase
<p><i>By Edward Bortnikov (<a href="https://twitter.com/ebortnik2" target="_blank">@ebortnik2</a>) Scalable Search Systems Research, Sameer Paranjpye (<a href="https://twitter.com/sparanjpye" target="_blank">@sparanjpye</a>), Sr. Search Architect, Ralph Rabbat, Director of Software Development, and Francisco Perez-Sorrosal (<a href="https://twitter.com/fperezsorrosal" target="_blank">@fperezsorrosal</a>), Research Engineer</i></p><hr><figure data-orig-height="210" data-orig-width="216"><img data-orig-height="210" data-orig-width="216" alt="image" src="https://66.media.tumblr.com/f2b4b7480b0260417bb7cbfc6cb21bd8/tumblr_inline_nujba4BSWb1t17fny_540.jpg"/></figure><p>Welcome to Omid (Hope, in Persian), an open source transaction processing system for <a href="http://hbase.apache.org/" target="_blank">Apache HBase</a>. Omid was incepted as a research project at Yahoo back in 2011. Since then, it has matured in many aspects, and has been re-architected for scalability and high availability. We have made this new code publicly available at <a href="https://github.com/yahoo/omid" target="_blank">https://github.com/yahoo/omid</a>. This is the first in a series of blog posts that will shed light on Omid’s APIs, administration, design principles, and internals. </p><p>Applications that need to bundle multiple read and write operations on HBase into logically indivisible units of work can use Omid to execute transactions with <a href="https://en.wikipedia.org/wiki/ACID" target="_blank">ACID</a> (Atomicity, Consistency, Isolation, Durability) properties, just as they would use transactions in the relational database world. Omid extends the HBase key-value access APl with transaction semantics. It can be exercised either directly, or via higher level data management API’s. For example, <a href="https://phoenix.apache.org/" target="_blank">Apache Phoenix</a> (SQL-on-top-of-HBase) might use Omid as its transaction management component. <br/></p><p>ACID transactions are a hugely popular programming models that are featured by relational databases. While early NoSQL data store implementations did not include transaction support, the need for transactions soon emerged. Today, they are perceived as essential to modern ultra-scalable, dynamic content processing systems. Omid’s system design is inspired in part by <a href="http://research.google.com/pubs/pub36726.html" target="_blank">Percolator</a>, Google’s dynamic web indexing technology, which reintroduced transactions to the NoSQL world in 2010. <br/></p><p>The current version of Omid provides an easy-to-program, easy-to-operate, reliable, high-performance platform, capable of serving transactional web-scale applications based on HBase. The following features make it an attractive choice for system designers:</p><ul><li><i>Development</i>. Omid is backward-compatible with HBase APIs, making it developer friendly. Minimal extensions are introduced to enable transactions. </li><li><i>Semantics</i>. Omid implements a popular, well-tracted <a href="http://research.microsoft.com/apps/pubs/default.aspx?id=69541" target="_blank">Snapshot Isolation </a>(SI) consistency paradigm that is supported by major SQL and NoSQL technologies (for example, Percolator). </li><li><i>Scalability.</i> Omid provides a highly scalable, lock-free implementation of SI. To the best of our knowledge, it is the only open source NoSQL platform that can scale beyond 100K transactions per second.</li><li><i>Reliability</i>. Omid has a high-availability (HA) mode, in which the core service operates as primary-backup process pair with automatic failover. The HA support has zero overhead on the mainstream operation. <br/></li><li><i>Simplicity</i>. Omid leverages the HBase infrastructure for managing its own metadata. It entails no additional services apart from those used by HBase.</li><li><i>Track Record</i>. Omid is already in use by very-large-scale production systems at Yahoo.</li></ul><p>To start working with Omid, you should be familiar with the key concepts of transaction processing. Wherever appropriate, we will provide the theoretical background required to explain how Omid works. For a deeper understanding, we recommend <a href="http://www.amazon.com/Transaction-Processing-Concepts-Techniques-Management/dp/1558601902" target="_blank">Gray and Reuter’s book on transaction processing</a>. <br/></p><p><b><br/></b></p><p><b>A Web-Scale Use Case</b></p><p>At Yahoo, Omid is a foundational technology for Sieve, a content management platform that powers our next-generation search and personalization products. Sieve essentially acts as a huge processing hub between content feeds and serving systems. It provides an environment for highly customizable, real-time, streamed information processing, with typical discovery-to-service latencies of just a few seconds. In terms of scale and availability, the development of Omid was largely driven by Sieve’s requirements. </p><p>Sieve ingests a variety of content channels (e.g., web crawl, social feeds, and proprietary sources) and applies custom workflows to generate data artifacts for target applications (e.g., search indices). In this context, data is streamed through processing tasks that can form complex topologies. Each task consumes one or more data items and produces one or more new items. For example, the document processing task for Web search consumes an HTML page’s content and produces multiple new features for this page (term postings, hyperlinks, clickable text phrases named anchortext, and more). The subsequent link analysis task consumes the set of links for a given page and produces a reverse anchortext index for all link targets. <br/></p><p>Sieve stores and processes billions of items. Thousands of tasks execute concurrently as the processed data streams through the system (resulting in tens of thousands of transactions/sec). All tasks read and write (typically) their artifacts from a multi-petabyte shared HBase instance. </p><p>Since the execution of individual tasks is completely uncoordinated, it is paramount to ensure that each task executes as a logically indivisible (atomic) unit, in isolation from other units, with predictable (consistent) results. Manually handling all possible race conditions and failure scenarios that could arise in a concurrent execution is prohibitively complex. Sieve developers therefore require a programming model that would allow them to focus on business logic rather than on system reliability and consistency issues.</p><p>Omid provides precisely this building block. It offers an easy-to-use API with sound and well-understood semantics. From an operations perspective, Omid is easy-to-administer, highly available, and highly scalable, and hence a natural technology choice for business-critical service like Sieve. <br/></p><p><b><br/></b></p><p><b>A Quick Tutorial </b><br/></p><p>Transaction processing systems offer application developers the well-known begin and commit APIs to mark transaction boundaries. An application can also abort a non-committed transaction, e.g., in response to an error situation. All database reads and writes in the scope of a committed transaction appear to execute atomically; all reads and writes in the scope of an aborted transaction seem to have never happened. </p><p>We are now ready to tip-toe into Omid’s API, and see a simple code snippet. All you need to know is two interfaces - TransactionManager and TTable. TransactionManager handles all control operations (begin(), commit(), and abort()) on behalf of the application. In this context, TransactionManager.begin() returns a unique transaction id (txid), to be used in subsequent calls. TTable is a transactional equivalent of HBase’s HTable. It augments HTable’s data access methods (get(), scan(), put(), and delete()) with the txid parameter, to convey the transaction’s context. </p><p>The following example demonstrates how to use Omid’s transactional API to modify two rows in an HBase table with ACID guarantees, which it’s not possible with the standard HBase API. We assume prior HBase programming experience. For background, please see the <a href="http://hbase.apache.org/book.html" target="_blank">HBase programming manual</a>. <br/></p><blockquote><p><b>Omid Code Example</b><i><br/></i></p><p><i>Configuration conf = HBaseConfiguration.create();<br/>TransactionManager tm = HBaseTransactionManager.newBuilder()<br/> .withConfiguration(conf)<br/> .build();<br/>TTable tt = new TTable(conf, “EXAMPLE_TABLE”);<br/><br/>byte[] family = Bytes.toBytes(“EXAMPLE_CF”);<br/>byte[] qualifier = Bytes.toBytes(“EXAMPLE_QUAL”);<br/><br/>Transaction tx = tm.begin();<br/>Put row1 = new Put(Bytes.toBytes(“EXAMPLE_ROW1”));<br/>row1.add(family, qualifier, Bytes.toBytes(“VALUE_1”));<br/>tt.put(tx, row1);<br/>Put row2 = new Put(Bytes.toBytes(“EXAMPLE_ROW2”));<br/>row2.add(family, qualifier, Bytes.toBytes(“VALUE_2”));<br/>tt.put(tx, row2);<br/>tm.commit(tx);<br/><br/>tt.close();<br/>tm.close();</i></p></blockquote><p>Please refer to this github <a href="https://github.com/yahoo/omid/wiki/Basic-Examples" target="_blank">page </a>for more code examples. <b><br/></b></p><p><b><br/></b></p><p><b>A Case for Snapshot Isolation</b><br/></p><p>The “I” in ACID is for Isolation - preventing concurrently executing transactions from seeing each other’s partial updates. The isolation property is essential for consistency. Informally, it ensures that the information a transaction reads from the database “makes sense” in that it does not mix old and new values. For example, if a Sieve transaction updates the reverse-anchortext feature of multiple pages linked-to by the page being processed, then no concurrent transaction may observe the old value of that feature for some of these pages and the new value for the rest. <br/></p><p>More formally, a system satisfies a consistency model if every externally observable execution history can be explained as a sequence of legal database state transitions. Omid employs an intuitive yet scalable <a href="http://research.microsoft.com/apps/pubs/default.aspx?id=69541" target="_blank">Snapshot Isolation </a>model to guarantee data consistency when multiple concurrent transactions access the same data elements. This is implemented in popular database technologies such as Oracle, PostgreSQL, and Microsoft SQL Server. Explaining this design choice requires a small detour into history, and can be skipped in the first reading.</p><p>Over the years, the database community studied several transaction isolation models. The most intuitive one is <a href="https://en.wikipedia.org/wiki/Serializability" target="_blank">serializability</a> - a guarantee that the outcome of any parallel execution of transactions can be explained by some serial order. In other words, transactions seem to be executing sequentially without overlapping in time (or alternatively, every transaction can be reduced to a point in time when it takes effect). Serializability implementations were traditionally based on <a href="https://en.wikipedia.org/wiki/Two-phase_locking" target="_blank">two-phase locking</a> methods that date back to the 70’s. While offering an extremely simple abstraction, serializable systems have been shown to suffer from <a href="http://dl.acm.org/citation.cfm?id=1376713" target="_blank">significant performance bottlenecks </a>due to their lock-based concurrency control. Since the mid-90’s, researchers set out on a quest for models that translate to scalable implementations. </p><p>A <a href="http://research.microsoft.com/apps/pubs/default.aspx?id=69541" target="_blank">seminal paper by Bernstein et al.</a> suggested a new model, named Snapshot Isolation (SI), which relaxes the correctness criteria of serializability – most applications do not require such a strict correctness criteria – and it is suitable for lock-free implementations. In a nutshell, SI allows transactions to be reduced to two points instead of one: a reading point and a writing point. Inconsistent reads do not occur, since each transaction sees a consistent snapshot of the database. This means that for two concurrent transactions T1 and T2, T1 sees either all of T2’s updates to data items it is reading, or none of T2’s updates. SI guarantees that (1) all reads in a transaction see a consistent snapshot of the database, and (2) a transaction successfully commits only if no update it has made conflicts with any concurrent updates made since that snapshot. As stated before, while SI is slightly different from serializability in terms of correctness (e.g. SI does not avoid the write-skew anomaly), SI implementations offer a better performance than strict serializability implementations.</p><p>The diagram below illustrates snapshot isolation. In simple terms, two transactions conflict under SI if and only if: 1) they execute concurrently (overlap in time); and 2) write to the same element of a particular row (spatial overlap). Here, transactions T1 and T2 overlap in time but not in space (their write sets do not contain modifications on the same items), therefore they can both commit. T2 and T3 overlap in time and in space (both write to R4), therefore one of them must be aborted to avoid consistency violation. T4 on the other hand, does not overlap any other transaction in time and can therefore commit.</p><figure class="tmblr-full" data-orig-height="305" data-orig-width="720"><img data-orig-height="305" data-orig-width="720" alt="image" src="https://66.media.tumblr.com/ee6ee99354e0ef5e89ccf1eb42a7e9cc/tumblr_inline_nuj98tx4G91t17fny_540.jpg"/></figure><p>Snapshot isolation is amenable to scalable implementations. NoSQL datastores that support multi-version concurrency control (MVCC), namely storing multiple versions capturing the history of an item associated with a unique key, are suitable for implementing SI. In such a system, transactions read data via immutable snapshots based on historic versions, and write data by creating new versions. Therefore, reads and writes need no coordination. Furthermore, writes can proceed without coordinating with concurrent transactions until they attempt to commit, at which time inter-transaction order is established and conflicts are detected.</p><p>Fortunately, HBase is natively multi-versioned, whereby clients can control version numbers (timestamps) and retrieve data through snapshots associated with a given timestamp. While this API is complex to use in regular applications, it is a perfect fit for Omid, which exploits it to provide a simple and clean SI abstraction. We leave the implementation details for future posts.</p><p><b><br/></b></p><p><b>Acknowledgement </b><br/></p><p>We would like to acknowledge all the contributions to Omid, as concept and code, since its early days. Our thanks go to Daniel Gomez Ferro, Eshcar Hillel, Flavio Junqueira, Idit Keidar, Ivan Kelly, Francis Christopher Liu, Matthieu Morel, Benjamin (Ben) Reed, Ohad Shacham, Maysam Yabandeh, and the whole Sieve team. </p><p><b><br/></b></p><p><b>Summary </b><br/></p><p>This post introduced Omid, an open-source transaction processing system for HBase. We presented a web-scale application at Yahoo for which Omid is a critical building block, saw a simple code snippet that exercises Omid, and finally, discussed Omid’s snapshot isolation model for preserving data consistency. Our next post in this series will provide an overview of Omid’s architecture and administration basics. </p>