Transcript:
Gil Yehuda: Hi Everyone and Welcome to the Dash Open Podcast. Dash Open is your source for interesting conversations about open source and other technologies from the open source program office at Verizon Media. We're home to many leading brands including Yahoo, AOL, Tumblr, TechCrunch and many more. My name is Gil Yehuda and I'm on the open source team at Verizon Media. Today on the show, I'm excited to chat with Aaron Klish. Aaron is a distinguished architect on the data platforms and systems engineering team at Verizon Media. Welcome to the podcast, Aaron!
Aaron Klish: Thank you. Glad to be here.
Gil Yehuda: Tell us a little bit about yourself. How long have you been on the team and what do you do?
Aaron Klish: Been working at Yahoo and now Verizon Media for close to 11 years. Over that time I've had an opportunity to work on a lot of different data systems. I work on the data team and so I've worked on everything from our collection platforms. These are very large scale systems that collect information from tens of thousands of systems in the company and bring it back to our data processing systems. I've worked on the processing systems, the data warehouses, and data marts. More recently I worked on Flurry Analytics, which is a company that was acquired by Yahoo that provides an analytics platform for mobile app developers and most recently on our ad stack on building out a rules based targeting platform for our DSP.
Gil Yehuda: DSP?
Aaron Klish: Demand-side platform.
Gil Yehuda: So very data-centric.
Aaron Klish: That's correct. Yes.
Gil Yehuda: Is it easy working with data?
Aaron Klish: Data has its own set of challenges. It's interesting because obviously you have to work with large amounts of data that tend to grow year over year and there's always a need to be able to garner insights from that data. You want a very short response time and interactive query capability associated with that data. So there's always a challenge and over the years I've worked at Yahoo, you continue to see big strides in new architectures for crunching that data in faster and faster response times.
Gil Yehuda: Right. It used to be that, I remember when we were looking at data and the issue was data accuracy, right? Like we really needed to make sure that the data was clean and perfectly accurate because dirty data was a problem. And now it seems, and I’d like to hear you comment on this, if you agree, or if you have insight on this, it seems that it's less about the accuracy of the data, but more about the ability to process the data quickly. Because the data comes in so quickly that it's almost like even if we have a couple of pieces of data that are a little inaccurate, if we can process it quickly, the value of that should override the cost of like being perfect. Would you say that?
Aaron Klish: Yeah, I'd say that's correct. I think that there's, I mean you tend to want most of your data to be accurate, but it's okay to have some of your data not having everything completely cleansed and you can still build meaningful insights even if your data has some inaccuracies inside of it. Like how quickly can we build something that can get us those insights. And so you have to very rapidly string together a bunch of different technologies in order to get the insights to the business that needs them in a short amount of time. So that's kind of the challenge I'd say today with data.
Gil Yehuda: Sure, so we're here to talk about one of the projects that you're working on. As somebody who has spent more than a decade deep in the world of extracting value from a data ecosystem, you have an open source project out there that you wanted to share with our listeners. What's it called?
Aaron Klish: So basically we worked on a project called Elide, think of data as like a full-stack problem. Everything from collection to presentation. This is more along the presentation side. So typically you're going to need to have some kind of visualization or UI and then to present the data. But you'll also need what we call typically a middle tier that connects the data visualizations and your user interface to the actual underlying data stores that serve the data. And so Elide is basically, it's a Java library that exposes what we call an application data model as a middle-tier web service through JSON API and GraphQL APIs. And JSON API and GraphQL are essentially, they're modern standards for what's called a Graph API.
Aaron Klish: And a Graph API is, you can think of it as kind of an evolutionary improvement over a traditional CRUD API. Where CRUD stands for create, read, update and delete. And that would traditionally, a developer or client would hit a web service and they would want to manipulate or read a single entity of the application domain model in a single request. And so the main improvement with Graph APIs is they let the developer read or manipulate an entire subgraph, which they can construct at query time of the end of the application domain model in a single request round trip. And so for an application that needs to be responsive, especially something that's mobile, you want to reduce the number of round trips from the UI and the backend and you also want to reduce the number of data that's in flight over the wire. And so those APIs, those style of APIs, the Graph APIs let you do that.
Gil Yehuda: So what does Elide provide over GraphQL?
Aaron Klish: GraphQL is more like a specification for building APIs. Elide has a very opinionated instance of GraphQL because it solves some problems, especially like how do you do mutations of, I want to be able to manipulate these four things at the same time in a single transaction around trip. So Elide takes an opinionated stance on GraphQL and how you would do that as a developer. But basically Elide is more than that. It's a framework for building these middle tiers very rapidly.
Aaron Klish: So I was getting at the problem that we have, which is that we need to build data applications and provide insights to the business very quickly and so Elide is what you would consider part of, it's a core component of what we call a low code application architecture and so the idea is that you can quickly stand up an entire data application in a very short amount of time, in a few weeks for example. In the case of Elide, what a developer would do is they would define their application data model as a series of entities and relationships between them. What you call an entity relationship graph and then you decorate those things with security rules, data validation rules, and business logic that you can sort of tie into the models fields whenever they're manipulated or read.
Aaron Klish: And then you can connect that model to a persistent backend, which is very easy to do in Elide and then take everything and drop it into a container like JEDI or Undertow and you'd essentially have a fully featured middle tier that can serve your UI. And would support everything like rich filtering, sorting, pagination, search, and scheme introspection, which are all kind of the toolbox of things you need to build a user interface.
Aaron Klish: So it's the one component of the architecture for what we call rapid data applications. Now there are other pieces. And I think in future podcasts we'll talk about some of those other pieces as well. But it's definitely one of the core pieces of things you have to have when you're presenting data. You need the ability to interact with it to create reports, alerts when data changes, and things like that. You need to interact with the business side of the objects. You need both to create an invoice or short those kinds of things.
Gil Yehuda: So tell me a little more about what kind of developer, what kind of a use case or industry data, heavy industry would use Elide.
Aaron Klish: If you are building a web application or a mobile app and you need to read and write data from a database of some kind. I think Elide would be a good choice for a middle-tier to look at to see if it meets the needs of the application that you're building. I think as far as this low code application architecture, which is like a separate thing that's solving a problem for a business that may not have traditional software engineers. Maybe they're not computer science trained but they know how to write some code. And so they can stand up applications in a very short amount of time just writing little snippets of code. That's a different problem. Elide is a piece of that. But Elide, I would expect someone to use Elide directly if they’re familiar with building applications, web applications, and mobile apps. They could take that framework and get something set up in a few minutes for that application.
Gil Yehuda: Right. So if you're in a business where you're sitting on a ton of data and it may be in one or maybe a couple of different data stores and you need to do something that's interesting with the data, but writing a lot of code isn't interesting.
Aaron Klish: Yes, that's right.
Gil Yehuda: But extracting value from that is interesting. So Elide would be the kind of solution that you might look for.
Aaron Klish: That's absolutely correct.
Gil Yehuda: Okay. Now Elide is open source.
Aaron Klish: That is correct.
Gil Yehuda: Okay. What does that mean?
Aaron Klish: First of all, the source is free to use. It's under our very friendly Apache 2.0 license and so anyone can use it, modify it, make derivative works, they can come and contribute, they can help us build this. We're always looking for developers to come in to help us shape the future of this project. We're very excited about it. We have a lot of internal use at the company, as well as, people that have been contributing and using it elsewhere as well. So we're always looking for new ideas, new developers, people to submit bugs and improve the quality of the software for the community.
Gil Yehuda: That's awesome. So that sounds like a real invitation to anyone out there who is listening, to anyone in the audience. If you're building data-focused applications, take a look at Elide, it's on github.com/yahoo/Elide, well documented and available. You mentioned community and that sort of reminded me to ask you, we're here in Champaign, Illinois, traditionally a Yahoo Office. What's it like working in Champaign, Illinois?
Aaron Klish: Champaign, Illinois is probably most noted for the University of Illinois Champaign. It's a small community of about a hundred thousand people. I like to call it micro-urban, so it's got a lot of the elements of a larger city but it doesn't have a lot of the problems associated with a larger city like congestion and so forth. I have about an eight minute commute to work every day, but I get to work at a great technology company. The culture here is really amazing. It feels like working as part of a smaller company, even though we're much part of a much bigger company that is doing something very innovative and has lots and lots of resources. So it kind of marries those two things very nicely.
Gil Yehuda: And I understand we're hiring, which means that if you're in the Champaign area or really in the Midwest and you want to work at a large company that feels like a small company, so you have all the resources and exciting problems of a large company, but you have just that intimacy and community feel of a small company then the Champaign office just seems to be a really great place to check out. It's on the University campus. The University is right up the road.
Gil Yehuda: Thank you very much for the podcast and thank you all for listening. If you enjoyed this episode and you wanted to learn more about our open source program at Verizon Media or other technologies that we have available, please visit us at developer.yahoo.com. You can also find us on Twitter at YDN.