...intellectual freedom that lets you be curious and creative...
Austin is a Physicist at heart. He brings physics to AI and explains how guided missile theory and passing a soccer ball are related.
Ben: Welcome to the Masters of Data podcast, the podcast that brings a human to data and I'm your host, Ben Newton. One of my favorite things to do in this podcast is to talk to people who do interesting things with data. Austin Basye is definitely one of those people. A data scientist trained in the world of particle physics. Austin is a senior data scientist at Hudl, a really cool company that provides a video review and performance and analysis tools for sports teams and athletes.
In this episode we're going to talk about, first of all, why physics is awesome, of course, how guided missile theory and passing a soccer ball are related and much, much more. I think you're going to enjoy it. So without any further ado, let's dig in.
Welcome everybody to the Masters of Data podcast, and I have a real treat here today. Somebody from one of my favorite companies, Hudl, and that's Austin Basye who is a senior data scientist at Hudl and we're gonna talk a little bit more about what Hudl does in a minute, but welcome to the podcast, Austin. It's really good to have you here.
Austin: Thanks. I really appreciate it. I've been looking forward to it all week.
Ben: Oh, good. I've been looking forward to have you on here. It actually just occurred to me, isn't there a jazz artist Switzerland Count Basie is that ... It's spelled differently, right?
Austin: Yes, indeed, but I've listened to his music.
Ben: No relation.
Austin: Yeah. No relation as far as I'm aware, but I'd love to claim him. His music's fantastic.
Ben: Well, it's good to have you here. Well, like we always do on this podcast, I like to humanize the people we bring on. We talk about bringing human to the data and so what's your story? How did you get into data science? What's your origin story?
Austin: Oh man, I have a pretty unique one. I don't know how far we want to go back. I grew up in the Panhandle of West Texas and through some luck and privilege and some really great role models, I ended up in a pretty good undergraduate program for physics and I got to do a lot of research early on as an undergrad and I just fell in love with it.
I think the intellectual freedom you get from really hard problems is just addicting because for not hard problems, there's usually a right way and the crank to turn, but when you have like a truly hard problem that nobody knows how to solve, there's just this kind of intellectual freedom that let's you be curious and creative and I totally got addicted to that atmosphere.
I graduated with an undergraduate degree in engineering physics and then I went to graduate school at the University of Illinois at Urbana–Champaign to study experimental particle physics.
Ben: That sounds easy.
Austin: Yeah. Well, it was fun. Just the size of the datasets we had to work with. I naturally had to develop a programmatic data analysis skill set. That was a lot of fun. I love learning all of that. I was fortunate enough to actually, in the middle of that work, go to CERN in Geneva, Switzerland. I lived there for about three years with my wife doing the wrapping up the rest of my dissertation work there.
Ben: Oh, it's pretty amazing.
Austin: Yeah, it was a lot of fun. I was fortunate enough to be there in 2012 when the Higgs was discovered. Technically I'm an author on the Higgs discovery paper, but these high energy physics collaborations everybody is an author-
Ben: You and 300 other people?
Austin: Well, 4,000 other people actually.
Ben: Oh, really?
Austin: Yeah. Yeah. And that's just one experiment. I was with the ATLAS experiment. The CMS experiment had another 4,000 and then there's another 3000 or so folks involved with the actual large hadron collider, the machinery equipment to do that, that also deserve a ton of credit. So it was a really big endeavor and that was a lot of fun.
Interestingly enough though, and hopefully this'll be a relevant segue to my actual work, I knew pretty much early on when I got there that these are not my people. I love everybody that I got to work with at CERN. They're just extremely diverse, extremely intelligent folks, but I learned that they have this love of physics and there's absolutely nothing wrong with that. Physics is beautiful, there's so much to love with that, but it contrasted to some extent with my motivations and my desires.
I wasn't so much interested in the physics as I learned, as much as I was interested in the fact that it was hard and I could be creative. And unfortunately to some extent, particle physics is a fairly mature field and so there's a lot of "right ways" to do things. And I'm not just a big fan of coloring inside the lines, so I was ready to get my PhD and then go find something else to go learn. That ultimately is what brought me to Hudl.
Ben: And by the way, for those listening, you haven't heard of CERN, that's basically the premier particle research facility in the world right at this point, right?
Austin: Yeah. Yeah. Currently.
Ben: It's pretty amazing. Is it near Zurich? Am I remembering right?
Ben: Geneva. Okay.
Austin: Yeah. We were too poor to actually live in Geneva, so we lived in just over the border in France in a little small French town called Thoiry.
Ben: For whatever reason, wherever I go, my best memories are of food. And I actually told somebody this the other day and he tried it out like that little ... It's a little town on Lake Geneva. And I had a pizza with an egg drop on it when it was hot out of the oven. I was like, "I'm in heaven."
Austin: That's the one thing we miss. There's a lot of great things about Texas. I'm born and bred Tex and I love it. But man, I would kill for some gruyère cheese more often, and it's like $8 a block here. It's insane.
Ben: Oh, yeah, no, I understand what you're saying. Well, the other than that came to mind to relating back to some other discussions we had on the podcast, but for also for people to know, I interviewed Richard Muller out of Berkeley. He's a physicist emeritus now at this point. He's been a professor of physics there for a long time and he got into climate science and all these other things, but he started out in particle physics, and he said that that is a really good start for understanding data.
[inaudible 00:06:21] think you have to hear what I mean. He'd been doing it for, I'm assuming he was in the field for like 50 years or something like that. So he'd been around for awhile. But physics is a great background for that. You and I were talking about before, I started out in physics in grad school and I think it's a great background, like you said, for solving hard problems, having to deal with data.
I know when I was a physics grad student, the vendors would hate us when we call them because we were the ones that tore their stuff apart and [inaudible 00:06:45], and we're like, "Well, when I took your device apart and I looked at the way that you had connected, like it didn't work." And I'm like, "Sir, you just broke the warranty. Well, that's not what I'm talking about. I want to talk about your design of device."
Austin: Yeah. Far be it for me to disagree, but I tend to come at it from the same angle. I think a physics education is really fungible. I think there's a lot of ways you can reapply it to other places. Obviously, it is absolutely not a replacement for a domain knowledge. And I don't want to suggest that it is, but I think the first principles approach that you learn in a canonical physics education, I think is extremely transferrable.
Ben: Yeah, exactly. Exactly. Back to the story at hand. You left CERN, you put physics behind you when you ended up at Hudl right away?
Austin: Yeah. I was writing my dissertation. I got to a point where a friend of mine is basically a world expert and he graduated a year before me and I sent him an email and said, "Hey, can you look over this paragraph and make sure I don't make an idiot of myself," anticipating people would actually read my dissertation. But he checked it out and send me back some comments. And then he also said, "Oh, by the way, I got a job with this really great sports company. You should send me your resume cause we're hiring data scientists."
I said, "Okay, yeah, I'll send it." And at the time I was like, "Ah, sports companies, I'm not sure if that's something I'd be interested in." I love sports. I grew up I'm a big Dallas Cowboys fan. I'm in Texas. And this friend of mine was actually a really big Patriots fan. And so we would work at CERN together and then my wife would cook really great food and he brought the laptop with the really expensive NFL game pass and we would watch videos together.
We were really close friends and so I sent my resume and I went and interviewed and boy, I fell in love with the company immediately. They're based out of Lincoln, Nebraska. And as a result they have that Midwestern extreme niceness and humility that was just refreshing. I had started the job search into data science before a little bit and that was just the really unique thing about this company, was just how just nice and humble and just how pleasant and comfortable the culture and work environment was. So I was like, "Yeah, let's do this. Let's see where this goes."
Ben: That's great. And I'm a big fan of what you guys do. I've been familiar with the company for a few years now to be able to see you grow. And so tell for those people who don't know, what does Hudl actually do? What's your business?
Austin: Hudl started in 2006. In the general idea is, Hudl's a tech company that handles the capture, storage, distribution and access of sport video. And so it started out targeting the elite market where I define elite as like Division I NCAA and then the professional leagues, the NFL, NBA, and it wasn't until a handful of years later that it was realized that the actual really profitable market is the competitive amateur market where there's just a ton of high school teams that this solution is actually extremely Beneficial for.
A lot of our customers are high school athletes and high school coaches. We still have products in the elite space and we're actually growing in that elite space as well. But our bread and butter at this point is the competitive amateur market. And so coaches will take video of practice of games. In the olden days you would meet with another coach in the kind of diner at some point midway between high schools to exchange film on common opponents.
Now you just send an email with a link or you don't even have to do that. You can just share within the Hudl application itself. That really solidified that presence in the market. Just making video really easy to access, store and distribute. Now, our big overarching vision is to capture and bring value to every moment in sports. And so that's where my role is baked into the mission of Hudl, is to find ways to bring value to every moment in those sports.
Ostensibly, we're a video company, but the reason that video is valuable is that it encodes competitively valuable information for the coaches that can extract it. That's why they want us to store their datas because that data gives them an edge. And so what I would like to do is figure out ways that we can, to some extent, automate that process. Can we pull insights out of that video to give coaches and athletes that competitively valuable information to help them improve, to help them become more competitive, up their game basically.
Ben: Yeah, that makes a lot of sense. And how specifically do coaches use the data? 'Cause I know having known a few people from Hudl for a while, I've heard some of the stories, but there's multiple different people using the video 'cause you have the athletes themselves who are trying to get recruited and the coaches are using it to understand how to train their players.
What are the places where the big use cases that you guys see for the actual video?
Austin: Definitely those kinds of personas if you will, that you mentioned are crucial. And we do our best to serve them through various kinds of features in our applications. The one I'm probably most involved with in my current role is we're targeting the elite level European football market. So soccer for Americans. And that's actually a really interesting scenario because right now we have like two types of data if you will, that the video generates.
We have event data. That's like a clip of video when a pass occurs. And so that would be an event. And then we have like tons and tons of these structured data, these structured events for video of when things are happening. That is actually really, really common. We have a professional elite level targeted product called the sports code that allows teams to curate their own events and important events from other kinds of third party sources.
It's very technical and so we have these folks that spend a lot of time to become experts in knowing exactly what their coach, what their manager likes to see in their personal definitions for things. And they code the video up that way. On the other hand, we have this tracking data and the tracking data is like this 25 hertz, so 25 times a second we have high accuracy position data for all the players. Using physics you can take derivatives and get the velocities and accelerations of that.
That tracking data actually is lagging in terms of its uptake in the elite market compared to the kind of event level data. The event level data is really consistent with the more statistical approaches. We have a bunch of these discrete events that you can build statistical models around. And so that's something we saw and so the tracking data was something that we thought we could actually look into a little bit closer with our physics backgrounds.
Again, I was working with this friend of mine at the time who was also a physicist. And so the coaches were using the event data fairly reasonably, but the only kind of users on the teams of the tracking data at the time were the sports scientists, the guys that manage recoveries. Their job is to make sure that each player is at their physical peak for every game. They look at how fast the players run, what their acceleration moments look like, so they can tailor recovery and tailor training the next week.
We thought, "Man, there's this huge opportunity for actually pulling tactical and strategic information out of that instead of only just this really important more physical data." And so that was one of our goals was how do we prove to a coach or a manager somebody who's making tactical and strategic level decisions, how do we provide insight to that level based on just this tracking data? Because we saw a lot of value in that.
So we built a model that kind of leveraged that to identify when players are open. When they make big runs. The Benefit of this analysis was two fold. One it's a building block analysis in that if you can identify when players get open or position themselves optimally, then you can layer on top of that more high level insight analysis, but in addition, that meshed really well with one of our products, which was to identify these moments and add event tags.
For that, a person has to watch it. But if you can teach a computer right to know when a player goes on a run and it's really opened and we can automatically tag those sorts of situations. That ultimately led to a paper that we published at the Sloan MIT Sports Analytics Conference on the physics based modeling of pass probabilities. That was a whole lot of fun work that we get to do with that.
Ben: I was looking at that and that's ... Basically, if I understood it right, you guys were able to predict how people would pass in the likelihood that there would be a pass whether the pass was successful, am I remembering right?
Austin: Yep. Yep. That was it. I helped William, my friend. He did a lot of the analysis and modeling to get that to work. It started out with a missile guidance algorithm.
Ben: Of course.
Austin: We realized that a player at any given point in time can only influence a small portion of the field at any given time based on where positioned, and so the question is then if he wants to influence some other parts of the field, he has to get there and he's got to take an optimal trajectory to get there in the least amount of time. And so he is this missile, it's not a very fancy one. It's the constant and bearing decreasing range algorithm.
We take the players current position and velocity and we try and determine what's the minimum amount of time that he could get to anywhere else on the pitch. And we repeat that analysis for every other player. And then based on that, that gives us an idea of who could ever be involved with any sort of pass based on the ball's position at each kind of time step. And so based on these really fundamental first principle physical quantities, then we can start layering on top of that, this model of how long does it take to control the ball, how long does it take to intercept the ball?
And then based on that you can start fitting that model against data that we had. A lot of fun to actually produce that.
Ben: That's a super cool use case. And so even stepping back to where you started, you talked about creating event data. I think if I remember right from some of the conversations we had before about what you guys do, a lot of that's because we've got this massive amount of video, these games are longer games and if you're a coach or a player or whoever else and you're trying to get the particular moments, you need the events to be able to navigate the video. Is that the end goal?
Austin: Yeah, exactly. That's exactly it. And actually the thing that I'm probably most passionate about when I started at Hudl, which is they have all of this unstructured data, like tons and tons of video. Right now that becomes super valuable when coaches or experts go in and add that structured data to it, those events, that metadata, and right now a human does it at real time. One minute of video, it takes them one minute to add this structure in the best case.
A lot of times it takes a really long time and you need a lot of people to add all of the structure you need. And so the thing I'm really passionate about is can we find ways to take this unstructured data and automatically add structure to it with deep learning approaches or with advanced heuristics? I'm just really interested in that hierarchal adding layer upon layer of structure without having to involve a human 'cause that allows us to scale a whole lot obviously, but also it saves our users valuable time so they can spend time on more important things.
Ben: For whatever reason, this reminds me of ... 'Cause I'm a musician and a geek. I love to try out new different software around audio and a lot of the advances that have been made about mapping out audio, detecting the notes, detecting the chord structure and all these kinds of things. We're at a point now where I have a free app on my iPhone that I can record something and it tells me what key it was in. It tells me what the chords were.
The point is it's not always a 100% correct, but it doesn't matter because it gives me enough where I can go in and adjust it. I can say, "Okay, well, it wasn't that it was this. It wasn't that chord I was playing or it's not the right chord there," but it gives me enough where I can do things I couldn't even imagine 20 years ago when I was in college playing around with a band or something.
I'm thinking that it's the same thing here. You give these customers enough where they're annotating instead of creating from scratch and now all these new opportunities arise that wouldn't even been imaginable before they had that. Does that sound right to you?
Austin: Yeah, yeah, absolutely. That was something that we learned, is that this kind of high level analysis that we did, like this paper really needs really good data to come into it and there's a lot of really fantastic analysis we can do if the data is there. And so the question is how do we get that data in their hands or in our hands as well so we can actually participate in the process of pulling the insights out of that.
Your music example I think is really appropriate because there's a lot of additional higher levels of extraction of creativity that can happen when your primitives are different and so we're very excited about hopefully, creating some automatically generated primitives for people to use to explore and be creative with.
Ben: Well, and I would assume too, with the audio realm or the ASCII data realm or whatever you want to call it, for a large degree those of, it seems like there are a lot for the longest because of the nature of that data. But I would guess that with video there's a lot more that needs to be done. If you need to go to missile trajectory mapping to get your inspiration. I mean what's the state of the market around this? I'm assuming it's not very far along.
Austin: Well, it's fairly robust to some extent. We have a product out that already does optical player tracking [inaudible 00:21:55] monocular. It's from one perspective of video. We actually have a couple of cameras that we stitched together effectively, but we're using deep learning pipelines to track players and then we sell that tracking data back to our customers that have installed our cameras.
That's a pretty mature process right now. For the most part, there's definitely a lot of room for improvement. We do have humans that are in the product loop there that correct the output of our algorithms and obviously one aspect of our work is to decrease the amount of time that it takes to correct our data. So basically improve the accuracy there.
From that is it possible ... Definition of maturity, yeah I think that's mature. But I do think there's a ton of opportunities for improvement because object detection in video is square one. Object tracking is square two, so there's really a lot of room to grow into event detection, identity tracking. If I just see a bounding box, I know who this is or if I see a sequence of bounding boxes, I can tell that this is a shot or a pass or that person's jogging or it's an aerial duel or something like that.
There's a lot of room, I think to grow that still exists. There's a lot of information encoded in those pixels that we haven't been able to teach a computer to reliably extract. And so I think there's a really high ceiling yet for things to be done in this space.
Ben: Exactly. I think that's what's cool about what you guys are doing and it reminds me, we were talking a little bit about this before, but I've seen this come up on a couple of different ways, but effectively part of what you guys are getting at is that in this day and age where so much data is being produced and particularly unstructured, like you're saying this data, this is just this raw, is not a process.
I've always said that the companies who wins are the ones who are able to extract value from that and actually figure out insights where they can provide more value to their customers, but I think is even cooler what you guys are doing in particular, what you're talking about. You're almost sending a two meta-layers there.
You're providing more value to your customers, these teams within those teams actually become more competitive because the more that they can extract from this data, then basically what you're ... In the value proposition you're putting in front of them is like, "Look guys, when you're at an elite level like that, the level of competitives I know it's probably pretty close. So if you can use this data and start to ..."
Like what happened with baseball. What can you do to extract more value to make rational decisions about how to compete.
Austin: Yeah. And that's really changing the NBA right now with the total disappearance of the mid range shot for instance. And I do think soccer, especially European soccer, is probably going to be the last domino to fall I think in that. I don't think that it's the last domino to fall because the insights are there. I think there are absolutely there and there's some really great things and models and insight that can be developed with the data.
I think there's some cultural pushback. I think there's still this highly romantic perception of the football manager whether a maestro or there are artists, they're not as CEO. And so I think there's some cultural barriers I think to that adoption. But I absolutely believe that these insights are real. And so they absolutely present a competitive advantage to those teams that use them.
As culturally the hesitation, however strong that is, I think it's going to be hard to argue with the results that just come from being able to leverage the insight and the data that you can leverage.
Ben: Yeah, that makes sense. With most things in sports, there's the reality of what goes behind the scenes and there's the pageantry that you put on TV so you can maintain the ladder without advancing on the former. Well, to put a bow on this, what's some of the cool things you're thinking about going forward that you can actually share? What do you think are some of the things where you think there's advances and some things going on that not everybody knows about that you're really excited about?
Austin: Yeah, honestly the things that I'm really excited about are entirely novel. There's an enclave of twitter handles that if you follow all of them, you'll see where all of this is heading. But I'm really interested in the two opposite ends of the spectrum of our data analysis in soccer in particular spectrum. So at one end you have all of this tracking data, you have everything and it comes to analyzing it, and just the fact that you have these coupled multi agent systems of players interacting with one another, that's just like a fascinating system to analyze.
I think there's a lot of things, once you have some rich tracking data, you can start to do some really interesting recurrent models with I think ... There was a really interesting paper that came out a few years back called ghosting where you basically can anticipate a given player's movement given the state of where everybody is and what their goals are, and you can learn that recursively with a bunch of tracking data.
I think that's just one application and there's some graph neural networks that are coming out that I think could be leveraged to pull a lot of this kind of interaction type information out of the player positions. And that seems really, really cool to me. But obviously before you can get to that you need this reliable high-frequency tracking data. And so my other interest is on the opposite end of that spectrum, the data acquisition side.
And it's like, so tracking data is really cool, can we get that tracking data from optical sensors of video, with the least amount of effort possible? And then what's the next primitive? What's after tracking data? Is it high frequency pose information? So we know where their arms and legs are at 25 hertz and figure out insight from there? Can we start to interpret body language to some extent and then what all can we pull out from just these pixels because we know coaches leverage those pixels extremely well and so they have some wet neural network that passes those pixels in to give them some really great insights.
How much of that can we codify and give the analysis end of the spectrum just more primitives to work with and build analyses on? So I like that full stack of things that you have to think about. And those are just two off the top of my head. I don't know if they're particularly well articulated, but-
Ben: You have me at graph neural network.
Austin: Oh, check those out. Those are awesome.
Ben: Now, the thing with soccer can you build a predictor to know when those guys drop on the ground and roll around moaning? Are they actually injured or playing? You gotta like a fake or not algorithm?
Austin: That's actually a very interesting problem, I think. From a game theory perspective, it's clear that you don't get calls as often if you play through those types of contact and so if you sell the contact, there is an advantage to be gained. How do you fix that? Is there a rules change? Is the rest need to be a little bit more aggressive with their carting of flops.
Certainly not at the same level but you kind of have that same no huddle, I have an injury now to let the defense substitute situations and so yeah, I actually find that fascinating to be honest. 'Cause there's definitely a lot of gamesmanship in that, is frustrating as it makes the game to watch.
Ben: Okay. Well, we're going to bring you back on the show when you figured that out.
Austin: Okay. Well, yeah that'll be awhile.
Ben: Austin, this has been a pleasure. I think what you guys are doing at Hudl and what you're doing personally it is fascinating and I really enjoyed the discussion. Thank you for coming on.
Austin: It's my pleasure. I enjoyed it. Thank you for your time.
Ben: Thanks everybody for listening. If you rate us on your favorite podcast app, that will help other people find us and check out the next episode in your feed. Thanks everybody for listening.
Speaker 3: Masters of Data is brought to you by Sumo Logic. Sumo Logic is a cloud native machine data analytics platform, delivering real time continuous intelligence as a service to build, run, and secure modern applications. Sumo Logic empowers the people who power modern business. For more information, go to sumologic.com.
For more on Masters of Data, go to mastersofdata.com and subscribe, and spread the word by rating us on iTunes or your favorite podcast app.