Dash Open 06: Apache Omid - Open Source Transaction Processing Platform for Big Data

Paul Donnelly: Hi everyone and welcome to the Dash Open Podcast. Dash Open is your source for interesting conversations about Open Source and other technologies from the Open Source Program Office at Verizon Media. We're home to many leading brands including Yahoo, Huffington Post, AOL, Tumblr, TechCrunch, and many more. My name is Paul Donnelly and I'm a Principal Engineer at Verizon Media. Paul Donnelly: Today on the podcast, I'm excited to chat with Ohad, who is a Senior Research Scientist, and Eddie, who is a Senior Director of Research. Paul Donnelly: Can you talk about your focus at Verizon Media? Eddie Bortnikov: Our team is called the Scalable Systems Team and has existed for about five years. We focus on the technologies that serve most of our products in the big data and machine learning domains. Paul Donnelly: Awesome. So we're here to talk about Omid. Can you tell me what Omid means or how did you come up with that name? Eddie Bortnikov: Actually, we didn't because the guy who started this project was Iranian by origin. In Persian, Omid means hope. We inherited this project from the very inception and we took it to this stage in its production grade. Ohad is the main contributor to this technology and I hope he'll take this conversation from here. Ohad Shacham: Omid is a transaction layer on top of a NoSQL Database. For example, if you get atomicity in a row level, then by using Omid you can get atomicity in many rows, levels, tables levels, and so on. Paul Donnelly: Why was Omid created? Eddie Bortnikov: Omid was created as a research project around 2010 in the Barcelona lab. It existed as a research prototype for a couple of years, until a need came up for using it in the content management platform for Yelp. And, in that context, a need arose to update multiple data objects in the course of a dynamic data pipeline. So, for example, if you had to index a webpage, and you had to update the pages that this page is linked to, you might end up by reading one object from the NoSQL database and updating multiple objects in the NoSQL database. So in order to get this whole thing correct, you have to guarantee the atomicity of the entire process and this is exactly what the transaction processing technology provides. Paul Donnelly: Before Omid, what did people use? Eddie Bortnikov: In the SQL databases world, processing technology theory is well-known. The point is that, in the NoSQL world, which deals with much bigger amounts of data, there was no on par technology with that. So, in fact, you could deal with the problems that arise in the context of the distributed systems in which things happen in parallel at the application level, and then the application developer had to work really hard in order to work around all these problematic scenarios that come up. Or you could use a useful abstraction that is named ACID transactions, which is exactly what Omid provides, then your application becomes way simpler so that you can focus on the business logic. Paul Donnelly: Who is using Omid in the open source community? Ohad Shacham: We recently integrated Omid in a large project that was started for Apache Phoenix. It's actually a SQL layer that works on top of a NoSQL database. And to work correctly, it has to have a transaction, because it needs both our SQL transaction and also if you want to do secondary index correctly, then you have to have a transaction layer. So in the last year and a half, we worked on the connection we have to augment Omid with different features in order to support the different semantic features of Phoenix. This is one of the use cases. We recently did a major release in Omid for this. Paul Donnelly: Where can folks learn more about Omid or contribute? What's the GitHub identifier? Ohad Shacham: The GitHub identifier is currently the incubator only, as it’s actually a mirror of Apache. Apache has its own repository where all the Apache projects are. And there is a GitHub repository, which is a mirror of that. Ohad Shacham: In terms of contributions, in order to contribute to Omid, you need to be an Apache committer, but anyone can learn it and write patches, and we can review them and afterward, we can commit on their behalf. Once someone contributes and shows that they’re a real contributor, then we vote. Paul Donnelly: Awesome. Well Ohad, Eddie, thanks so much for talking with us about this very exciting product. Eddie Bortnikov: Thanks for having us. Paul Donnelly: If you enjoyed this episode of Dash Open and would like to learn more about Open Source and other technologies at Verizon Media, please visit developer.yahoo.com. You can also find us on Twitter @YDN.

More Episodes: