Dash Open 14: How Verizon Media’s Data Platforms Team Uses and Contributes to Open Source

Rosalie Bartlett: Hi Everyone and Welcome to the Dash Open Podcast. Dash Open is your source for interesting conversations about open source and other technologies from the Open Source Program Office at Verizon Media. Home to many leading brands including Yahoo, AOL, Tumblr, TechCrunch, and many more. My name is Rosalie and I'm on the open source team at Verizon Media. Today on the show, I'm so excited to chat with Tom Miller. Tom is a Director of Software Development Engineering on the Data Platforms and Systems Engineering Team at Verizon Media. Welcome to the podcast, Tom! Tom Miller: Thank you. Rosalie Bartlett: How is your day going so far? Tom Miller: It's going great. It's a typical day here in Champaign. Nice and sunny and warm outside, so we're doing good. Rosalie Bartlett: How long have you been on the team here at Yahoo in Champaign? Tom Miller: I've been in Champaign for a little over five years now. I've been with Yahoo for thirteen. I came in with the Right Media acquisition way back when, and I actually managed the team in Tallahassee, Florida at that time as a data team also. And then we closed that office down, and I worked from home for a couple of years, and then I moved here. So, I've been working data systems for a long time. Rosalie Bartlett: Very cool. So let's talk about that. Let's talk about your focus here at Yahoo. What is your day-to-day like? What are some of the problems that you and your team are focused on? Tom Miller: So I have several teams that I'm managing right now. The primary focus for my teams is reporting. The analytics and the key metrics for the company. So I own both digits, which is for all the audience metrics, things like how many users we have, how long people spend on the pages, and all that fun stuff. And I also own UAD, which is the revenue side of things. So, how much does this page make when people visit it during the day? And in addition to those two analytics platforms, I also own a couple of more infrastructure-type teams. One of my other teams manages the Druid clusters. Druid is open source and an Apache project. Tom Miller: My team actually contributes back to the community and we provide the packages that all the people within Yahoo/Verizon Media use. The internal version is called Gray Hawk and basically we just take the community version and strap all the additional stuff that the paranoids (security team) require. I also have another team that's working on a new format for Grid that has potential. It's a new file format for Grid that has the potential to greatly reduce both the computing and the storage capacity that we need to do our pipelines. So a lot of different things going on. Rosalie Bartlett: You are a very, very busy guy. Tom Miller: Yes. Rosalie Bartlett: So when you think about, you're doing a lot, but out of the things that you're doing, what are you personally excited about? Tom Miller: I'm most excited about how we take and solve problems using not only our own technologies. I'm kind of an old school guy that I don't want to reinvent the wheel if I don't have to, right? So we are constantly looking at technologies that are available, and we make use of open source extensively. So we have all the grid based stuff, Hadoop, Oozie and all that fun stuff. We do web services, we do UIs, and wherever possible we leverage the open source communities. If something does 90% of what you want, don't reinvent that 90%. Take that 90%, add what you need to it, and then contribute that back to the community. Tom Miller: So I'm always looking for those opportunities. And so our teams make use of everything from Node.js, Ember, down to, as I said, the grid toolsets and everything in between. And where, as I said, we're also doing some innovation work with new grid file formats and those types of things. And I'm really excited about that, because that has potential. We're starting to do some testing with a couple of advertising pipelines and we're seeing something like a 70% reduction in the file sizes, and a significant reduction in the CPU required as well. So it has the potential to completely overhaul the way we do pipelines. Rosalie Bartlett: I love that you're innovating and you're using a lot of open source, but are you also contributing a lot back in open-source? Tom Miller: Yes. I have the Druid team that is very active in the Druid community. And so some of the stuff that we've contributed back to the Druid community includes the sketch libraries. So DataSketches, Lee Rhodes is one of our architects in California that came up with this wonderful data structure called a sketch and it’s open source. And we make extensive of use of that, especially in Digits, which is our audience, because previously we had to build an aggregate for each combination of dimensions. So if you wanted to see for this property, the people that were using phones, we had to build an aggregate for that set of dimensions. With Sketches, we build one base set and then we can aggregate on the fly, and they're estimations, but they're estimations with known quality. So they're more than good enough for analytics. And so we actually wrote the sketch libraries for Druid. Tom Miller: Just recently we added rolling averages. It's called Moving Averages in the community now, and that's going to be included in Druid 15 moving forward. But we've been using it internally for a while now. So we're constantly tweaking what we need and then we're giving as much back to the community as we can. Another place we contribute quite a bit back is in the UI world. I have UI teams for both UID and Digits, and they work in the Ember community primarily. And there's a lot of libraries and components that we've contributed back to the Ember community. In fact, one of my architects has a clipboard add-on that's one of the most highly downloaded pieces in the Ember community. So we contribute quite a bit back there as well. Tom Miller: And then web services, we have Fili, which is an API that we developed to sit in front of Druid. And so, that's open source now and we do quite a bit with that as well. We always try to leave the world a better place than we found it. We use open source but we always try to add back to it and add value to the community, not just take what the community is giving us. Rosalie Bartlett: It's obvious that open source is very important to Yahoo/Verizon Media. For you personally though, why is open source important? Tom Miller: I think it's a major contributor to why software has gotten to where we are. Because if everybody has to develop everything from scratch, we wouldn't be anywhere close to what the functionality we have in the internet and across the businesses today. Basically by leveraging each other's strengths and taking what somebody else built and then adding to it and then providing that value back to the community, it helps us all. So that's, as I said before, we don't want to reinvent the wheel. And so, if we have something that has 90% of the functionality we need, let's take that, build the extra 10% and use it internally, but also give that 10% back and let somebody else build on that for what they need. That's kind of our philosophy. Rosalie Bartlett: What originally inspired you to get into engineering? Tom Miller: Actually, my background is military, I was in the Navy for a few years and I was a nuclear engineer. I worked on submarines and it was, you know, that kind of gave me the engineering mindset. And then I've always been passionate about computers and software. I remember tinkering out with a Commodore 64 and the TRS-80’s and all those old machines. When I was getting out looking for what my career was going to be after the Navy, I migrated to computers and software. Rosalie Bartlett: So Tom, you are incredibly well liked here at the company and your teams are very successful. How do you empower your teams to do such great work? Tom Miller: The biggest thing is just trust my team. I have a set of very talented engineers that worked for me and you know, I look at my job as basically to make sure my team has the resources they need, then help them remove obstacles. And then other than that, I stay out of their way. Let them be good engineers, right? So, I provide guidance, I provide input, but, for the most part I just trust them to do the right thing. I'll occasionally, give them pointers on have we thought about this? But, for the most part I just let them run with it and be themselves and that's proven to be one of the best ways to move forward. That trust goes both ways. I take care of my people and they make sure that our team looks good. So, that's the best way to work. Rosalie Bartlett: That's a fantastic approach. What is it like to work here? Because today we are in Champaign, Illinois. What's it like to not only work at Yahoo in Champaign, but what's it like to also live in Champaign? Tom Miller: Champaign is a great community. When I moved here I was actually given the option of moving to Sunnyvale, California. I mean, I liked the Bay area, but I couldn't live there. I'm more of a country guy, I'm not a big city guy. It's nice here where you can get across town in 15 minutes at five o'clock. I actually live about 20 miles from here and it takes me 20 to 30 minutes to get to work. I'm officially in a subdivision but my backyard is cornfield. Tom Miller: So, I'm not looking at my neighbor, I don't have to hear my neighbors and I can sit outside on the back porch and enjoy the sunset. But we're within an easy distance if you want to go to the city. If you draw a triangle between Chicago, St Louis and Indianapolis, we're right in the middle. Any one of those is less than three hours. So, you can commute there very easily. So it's kind of the best of both worlds. Rosalie Bartlett: If you had to kind of think about the type of person that would enjoy working here at Yahoo in Champaign, how would you describe them? Tom Miller: We look for strong engineers. We look for people who have experience across a breadth of technologies. We don't look for people who just know one thing and that's it. We look for the people that are willing to try different things, take on different challenges, and that's what we look for in the interview processes, is that the person's not just one dimensional, that they understand different types of technology and that they have a good approach to troubleshooting problems and those types of things. That's what we look for in our interviews. Rosalie Bartlett: Tom, it has been so great to chat with you today, thank you very much for your time. Tom Miller: Thank you. Gil Yehuda: If you enjoyed this episode and you wanted to learn more about our open source program at Verizon Media, or other technologies that we have available, please visit us at, you can also find us on Twitter @YDN.

More Episodes: