Dash Open 13: Using and Contributing to Hadoop at Verizon Media

Rosalie Bartlett: Hi Everyone, and Welcome to the Dash Open Podcast. Dash Open is your source for interesting conversations about open source and other technologies from the open source program office at Verizon Media. Home to many leading brands including Yahoo, AOL, Tumblr, TechCrunch, and many more. My name is Rosalie, and I'm on the open source team at Verizon Media. Today on the show, I'm so excited to chat with Eric Badger. Eric is a Software Development Engineer on the Big Data Platform team at Verizon Media. Welcome to the podcast, Eric! Eric Badger: Thank you. Happy to be on the podcast and happy to talk about open source and the stuff I'm working on. Rosalie Bartlett: Awesome. So Eric, how long have you been at Yahoo? Eric Badger: Early January in 2016, so I guess this would be about three and a half years for me, and I've had a really good luck with my team that I've had here and my manager. I've even got a new manager now and he's, you know, he's also fantastic. So yeah. Rosalie Bartlett: So you are on the Big Data Platform team. Eric Badger: Yes. Rosalie Bartlett: Could you maybe tell us a little bit about the types of work that you're currently focused on? Eric Badger: Sure. I work mainly on Hadoop - so big data, pipeline, Hadoop, also encompassed in that is HDFS, YARN, Tez, MapReduce. And so we have a lot of people that run on what we call the grid. This is just a bunch of different clusters that we have, a bunch of Hadoop clusters, and they span, you know, many hundreds or thousands of nodes. They're able to solve big data problems. So if you have a lot of data and you, this isn't something that you would want to solve on a single machine, and you probably don't want to solve this on a supercomputer, because that sounds really, really expensive. But you just have a lot of commodity hardware and you want to solve a problem, this is what we are doing. So we have a lot of different teams that are running their machine, learning things, trying to figure out what relates to you and make it so that you have the best user experience possible. Rosalie Bartlett: What about your current focus is very exciting to you? Eric Badger: That's a great question. So for me, what I really like about what I do is that my background is really more focused into operating systems and low level architecture. What I do now is I have a job that's based completely in Java, which you would say is not at all low level programming, right? You're in an actual VM when you start up a Java process. But the actual thing is, I really think of myself more as an operating systems engineer, because I'm just a distributed operating system engineer. Hadoop to me, or YARN I guess I would say, is just an operating system abstracted up a level. So instead of it actually being CentOS or RHEL or MacOS or whatever, you're actually above that layer writing an operating system on top of that. And instead of CPU cores, you have actual machines. Eric Badger: So it's just an abstraction layer above that. I really have a fun time being an operating system engineer, but not actually at an operating system level. And it's kind of doing the same problems but on a bigger scale, because you're not dealing with the single CPU cores, you're dealing with actual entire machine, and so I find that to be really fun. Being able to solve those, those bigger scale problems where you see things that will not happen nearly as often as if you put them in isolated cases. You know, we have this kind of philosophy that nothing is random at scale. There is no surprise at scale. If it can happen, it will happen if you run it on hundreds of thousands or millions of different tasks trying to do this same thing. Rosalie Bartlett: You are very involved in open source. Eric Badger: Yes. Rosalie Bartlett: Why is open source important to you? Eric Badger: Open source is important to me because I feel like everybody benefits from it. I think that if you have closed source, I think overall as a community, as a world, technology kind of slows down a little bit. I think that when everybody shares, everyone else is able to increase technology, to learn from everybody else and to not make the same mistakes that everyone else is making. So it really moves the entire world forward. Eric Badger: I personally want to see society grow. I want to see technology grow, and I want to see the entire world make cool things, because eventually I'm going to end up benefiting from that. If I contribute a feature to Hadoop or to YARN that is really cool and that other people want to use, that's great. Eric Badger: And then conversely I can do the same thing. If they go out and make a great feature, then we can possibly work with that and we can benefit from that. So everybody is doing something a little bit different, and everyone has their own features that they're doing this purely because this is what my company wants me to do, but that doesn't mean that you can't benefit from the entire open source community as a whole so that everybody reaps the benefits of basically everybody's labor. Rosalie Bartlett: For folks listening to this podcast who are thinking, "Wow, this open source thing sounds awesome," what is your advice for folks who are not yet involved in open source but want to get involved? Eric Badger: For open source, I mean, it's pretty much anybody can get involved in this. So you don't need to really have any experience. In my experience, the community is fairly welcoming. So in the Hadoop world, the YARN HDFS worlds, if you just go onto the mailing list and say, "Hey, I'm a new person, I'd like to get involved in this." There's going to be people that are going to respond, say, "Hey, that's awesome. I'd like to ... Here is how you do this. So I'll get you on the contributors so that you can go assign some JIRAs to yourself." Eric Badger: JIRAs are our basically where we do our tracking of bug fixes and new features and things like that that we want to change in the code base. You can go on there and check out the code. You can look at all of the different bugs that we have. It can be something as simple as unit tests. That's really where I got myself started in learning the code base, was just going and finding people that would post these JIRAs that say, "Hey, this unit test is failing. Why is this unit test failing? Someone must have broken it, or someone wrote a bad test or something," and you go out and you look at that test and by looking at the test you have to learn exactly how what it's testing works. Eventually if you do enough of those, then you kind of learn how the code works. Eric Badger: So take a subsystem of the entire project, go look at that, and just try and figure out how it works. Once you figure out how it works, then you're going to be able to contribute more features. You're going to be able to go on and review other patches, and then you're just going to be able to become a bigger part of the community as a whole. Rosalie Bartlett: Really great advice. If folks want to connect with you, Eric, what's the best way for them to do that? Eric Badger: You can go ahead and add me on LinkedIn. I'm usually there, I might not respond exactly in a day, but go ahead and send me a request or something and if you want to chat about open source or about what it is that I do here at Yahoo/Verizon Media, then feel free to send me a request or send me a message and we can talk it out. I’d be happy to help. Rosalie Bartlett: So Eric, it has been great to chat with you today. Thank you so much for your time. Eric Badger: Yeah, I'm happy to be here. Thanks for having me. Gil Yehuda: If you enjoyed this episode and you wanted to learn more about our open source program at Verizon Media or other technologies that we have available, please visit us at You can also find us on Twitter at @ydn.

More Episodes: