Yahoo Developer Network

Back to Podcasts

Dash Open 06: Apache Omid - Open Source Transaction Processing Platform for Big Data

Transcript:

Paul Donnelly: Hi everyone and welcome to the Dash Open Podcast. Dash Open is your source for interesting conversations about Open Source and other technologies from the Open Source Program Office at Verizon Media. We're home to many leading brands including Yahoo, Huffington Post, AOL, Tumblr, TechCrunch, and many more. My name is Paul Donnelly and I'm a Principal Engineer at Verizon Media. Paul Donnelly: Today on the podcast, I'm excited to chat with Ohad, who is a Senior Research Scientist, and Eddie, who is a Senior Director of Research. Paul Donnelly: Can you talk about your focus at Verizon Media? Eddie Bortnikov: Our team is called the Scalable Systems Team and has existed for about five years. We focus on the technologies that serve most of our products in the big data and machine learning domains. Paul Donnelly: Awesome. So we're here to talk about Omid. Can you tell me what Omid means or how did you come up with that name? Eddie Bortnikov: Actually, we didn't because the guy who started this project was Iranian by origin. In Persian, Omid means hope. We inherited this project from the very inception and we took it to this stage in its production grade. Ohad is the main contributor to this technology and I hope he'll take this conversation from here. Ohad Shacham: Omid is a transaction layer on top of a NoSQL Database. For example, if you get atomicity in a row level, then by using Omid you can get atomicity in many rows, levels, tables levels, and so on. Paul Donnelly: Why was Omid created? Eddie Bortnikov: Omid was created as a research project around 2010 in the Barcelona lab. It existed as a research prototype for a couple of years, until a need came up for using it in the content management platform for Yelp. And, in that context, a need arose to update multiple data objects in the course of a dynamic data pipeline. So, for example, if you had to index a webpage, and you had to update the pages that this page is linked to, you might end up by reading one object from the NoSQL database and updating multiple objects in the NoSQL database. So in order to get this whole thing correct, you have to guarantee the atomicity of the entire process and this is exactly what the transaction processing technology provides. Paul Donnelly: Before Omid, what did people use? Eddie Bortnikov: In the SQL databases world, processing technology theory is well-known. The point is that, in the NoSQL world, which deals with much bigger amounts of data, there was no on par technology with that. So, in fact, you could deal with the problems that arise in the context of the distributed systems in which things happen in parallel at the application level, and then the application developer had to work really hard in order to work around all these problematic scenarios that come up. Or you could use a useful abstraction that is named ACID transactions, which is exactly what Omid provides, then your application becomes way simpler so that you can focus on the business logic. Paul Donnelly: Who is using Omid in the open source community? Ohad Shacham: We recently integrated Omid in a large project that was started for Apache Phoenix. It's actually a SQL layer that works on top of a NoSQL database. And to work correctly, it has to have a transaction, because it needs both our SQL transaction and also if you want to do secondary index correctly, then you have to have a transaction layer. So in the last year and a half, we worked on the connection we have to augment Omid with different features in order to support the different semantic features of Phoenix. This is one of the use cases. We recently did a major release in Omid for this. Paul Donnelly: Where can folks learn more about Omid or contribute? What's the GitHub identifier? Ohad Shacham: The GitHub identifier is currently the incubator only, as it’s actually a mirror of Apache. Apache has its own repository where all the Apache projects are. And there is a GitHub repository, which is a mirror of that. Ohad Shacham: In terms of contributions, in order to contribute to Omid, you need to be an Apache committer, but anyone can learn it and write patches, and we can review them and afterward, we can commit on their behalf. Once someone contributes and shows that they’re a real contributor, then we vote. Paul Donnelly: Awesome. Well Ohad, Eddie, thanks so much for talking with us about this very exciting product. Eddie Bortnikov: Thanks for having us. Paul Donnelly: If you enjoyed this episode of Dash Open and would like to learn more about Open Source and other technologies at Verizon Media, please visit developer.yahoo.com. You can also find us on Twitter @YDN.

Show Full Transcript

Dash Open 07: Oak - Open Source Scalable Concurrent Key-Value Map for Big Data Analytics

In this episode, Paul Donnelly, a Principal Engineer at Verizon Media, interviews Eddie Bortnikov, Senior Director of Research, and Eshcar Hillel, Senior Research Scientist. Eddie and Eshcar share how Druid (open source data store designed for sub-second queries on real-time and historical data)inspired their team to build Oak, an open source scalable concurrent key-value map for big data analytics, and how companies can use and contribute to Oak. Learn more at https://github.com/yahoo/oak.

April 30, 2019 04 min 37 sec

Dash Open 05: Makeskill Design Kit, the Open Source Multimodal Rapid Prototyping Suite for Alexa

In this episode, Ashley Wolf interviews Lauren Tsung, who was previously a Sr. Designer for Yahoo Mail and Anna Shainskaya, a Sr. Designer for Yahoo Mail at Verizon Media. Lauren and Anna share their journey from designing chatbots to publishing Makeskill, an open source project for rapid prototyping Alexa Skills. Connect on LinkedIn with Lauren Tsung (https://www.linkedin.com/in/laurentsung) and Anna Shainskaya (https://www.linkedin.com/in/annashine/).

January 24, 2019 08 min 28 sec

Dash Open 04: Frode Lundgren - Building and Open Sourcing Vespa, the Big Data Serving Engine

In this episode, Amber Wilson interviews Frode Lundgren, Director of Engineering for Vespa at Verizon Media. Frode discusses the inspiration behind building Vespa and shares thoughts on personalized search. Connect with Frode on LinkedIn: https://www.linkedin.com/in/frodelu/.

January 22, 2019 07 min 30 sec

Back to Podcasts

Dash Open 06: Apache Omid - Open Source Transaction Processing Platform for Big Data

More Episodes:

Dash Open 07: Oak - Open Source Scalable Concurrent Key-Value Map for Big Data Analytics

Dash Open 05: Makeskill Design Kit, the Open Source Multimodal Rapid Prototyping Suite for Alexa

Dash Open 04: Frode Lundgren - Building and Open Sourcing Vespa, the Big Data Serving Engine