Dash Open 04: Frode Lundgren - Building and Open Sourcing Vespa, the Big Data Serving Engine

Transcript:
Amber Wilson: Hi everyone, and welcome to the Dash Open Podcast. Dash Open is your place for interesting conversations about open source, and other technologies from the Open Source Program Office at our company. Our company is home to many leading brands including Yahoo, AOL, Tumblr, TechCrunch, and many more. Amber Wilson: My name is Amber, and for the next segment we have a very special guest with us, Frode Lundgren. Frode is a Director of Engineering for our Vespa Services team. He's based in the city of Trondheim in Norway. As part of the Vespa team, he is developing the Vespa platform with a special focus on running the Vespa service that powers more than 150 of our various applications. Frode has been with the Yahoo team for the past 15 years, and prior to his current role, he spent more than a decade working on applications utilizing the Vespa platform, to power, for instance, the Yahoo Search Experience for news and other media related content. And right now, today, we're going to talk with him about Vespa. Frode, so great to have you on the podcast. Frode Lundgren: Thank you so much. It's great being here. Amber Wilson: Just for anyone who may not know what Vespa is, can you first just start off talking about what Vespa is? Frode Lundgren: Sure. Vespa is the Oath open source software platform for what we call big data serving. It is a platform that allows you to store search, rank, and organize huge amounts of data, and you can get it back in what we call user serving time. When we say user serving time, we focus on being able to return results in terms of milliseconds or even faster, so there's no noticeable delay for the end user, so these can be used to do really complex computations or model evaluations across huge amounts of data while the user is waiting for results. Amber Wilson: Nice, and who exactly is leveraging Vespa? Frode Lundgren: Vespa was initially built for search and searching, for people with vertical data, meaning data with a lot of richness and metadata where you typically have a lot of context about the user and/or the content you're searching. It was later involved into powering a lot of the personalization we have at Oath, and also with similar ad systems. In terms of finding the best match between the incoming requests, that being a search, the user profile, and the data, you have stored large amounts of data, and that could be image data, it can be news documents, it can be ads, whatever you need to match it against. Amber Wilson: Nice, and so how long has it been around, and when did it become open source as well? Frode Lundgren: Vespa, as a platform, we started to build in Yahoo around 15 years ago, and obviously, it has been changed over the years. We did a lot of the development alongside Hadoop actually back in the day, and we've been wanting to open source it for many years. Unfortunately, due to some of the complexities of open source and internet intellectual property, it took us some time. So it was open sourced last year, in September of 2017, so about a year. Amber Wilson: That's incredible. So just going back to 15 years ago, why was it first built? What need did you really see that it was going to serve? Frode Lundgren: Vespa has it's original roots way back in a Norwegian internet search company in the late nineties, and we ended up at Yahoo around 2003. And at the time, Yahoo needed someone to build the technology to do search, again in what we call the vertical content, which was Yahoo News, Yahoo Finance, Yahoo Sports, where you had a lot of content with a lot of what we call high quality, a lot of metadata, more information about the data content than just the text. And you also had a lot of information about the user. They were in the sports context; they were reading about a game. So they needed a platform to handle the search experience for those kinds of verticals. And a key thing was to actually build Vespa as a platform, as something you could run that will take care of probability, stability, scalability, and not just be kind of a library of something that everyone had to integrate and use. So that was the original charter back then. Amber Wilson: Awesome. And is Vespa still used to solve some of those initial problems, and how has it evolved? Frode Lundgren: Absolutely. Search is still a big part of what we do. Search has changed a lot, and especially these days we see the combination of the traditional search from a matching problem, finding the right content, to adopting the latest machine learning trends in terms of how do you calculate the score, how do you select what article is the best hit for any given query. In the beginning, this was all handcrafted, relevant scoring. These days there are advanced machine learning models that do the scoring. That use case is still very much around. You still see Vespa in action when you do search for many of the different properties around Yahoo and Oath. Of the more recent, five to ten years, we see more and more of this personalization case where the problem is not so much a typical user entering a search term, and more about a user coming in with some context. Frode Lundgren: Meaning who are you, what are your interests, what's your profile. So that the query, if you would, the request coming in, is not so much a few terms that someone typed, but it's more of a profile that comes in, and the goal is to find the most relevant content to serve. That could be news articles on the news front page where no one has typed anything, we just want to serve you personalized news, or it could be ad systems where you're taking into account ads that we think are relevant to you. And taking all those signals in and query time, meaning serving time, within milliseconds, able to make the decision and serve you the best possible content. Amber Wilson: Awesome. Thank you for sharing that. So we have talked a lot about the beginning of Vespa and where it is now, but where would you like to see it go in the future? How would you like to see it evolve? Frode Lundgren: Well in the immediate future, I really hope to see a continued increase in the use of Vespa around the world. People that want to get involved, again, Vespa.ai has links to Stack Overflow, a tag where you can post questions, you can file issues and get help, or reach out on gitter.im to talk real time with engineers, and we are very happy to receive contributions. I think beyond that, as I mentioned, as we are a big data serving engine, it is exciting to see where the whole machine learning landscape is moving, and I'm eager to see Vespa continue to adapt to the changes that are happening there. We're already in front with our tensor functionality and native support for tensor-flow and ONNX models inside of Vespa. We do want to remain in front of that development going forward. Amber Wilson: If our listeners are interested in learning about Vespa, and becoming more involved, what resources are available out there? Frode Lundgren: The short answer is go to vespa.ai and start there. That's our main webpage, and you will find links to documentation and to tutorials that will get you up and running very quickly. As well as, details on the different features of Vespa and reference documentation, and how to use it. There are also links to our blog and to the Twitter account where we post all the latest updates on the Vespa side. Amber Wilson: Well, thank you for coming here. We'll have to, next time, go to Norway. Frode Lundgren: Yes, you're definitely welcome. Frode Lundgren: Thank you very much. Amber Wilson: If you enjoyed this episode of Dash Open and would like to learn more about our open source program and other technologies, visit developer.yahoo.com. You can also find us on Twitter at @YDN.

More Episodes: