MOD: Masters of Data

Bringing the human to the data

Joshua Bloom: Industrial-Grade Machine Learning

VP, Data and Analytics, Professor of Astrophysics, GE Digital, UC Berkeley

2019年03月25日

36:38

[We] make predictions about failure that have never been realized in the world

Joshua Bloom is bringing the best of machine learning to GE's customers across lots of industries, as well as bringing it back to academia.

Show notes

Ben: Welcome to the Masters of Data podcast. The podcast that brings a human to data, and I'm your host Ben Newton. Well guess what? I am interviewing another physicist today. Like I always said, physics is a great education and our guest today shows that in spades. Joshua Bloom is Vice President of Data and Analytics at GE Digital, where he serves as a technology and research lead, bringing machine learning applications to market within the GE ecosystem. Previously, Joshua was cofounder and CTO of Wise.io which was actually acquired by GE Digital in 2016. And, not only that, since 2005 he's also been an astronomy professor at the University of California Berkeley where he teaches astrophysics and python for data science. So, let's just say that Joshua Bloom is a very interesting guy. So, without any further ado, let's dig in.

Welcome, everybody to the Masters of Data podcast, and I am very excited about my guest today. I'm actually meeting him in his office space in San Francisco. I'm talking to Josh Bloom. Welcome on the show. Josh.

Josh: Thanks for having me.

Ben: It's good to have you here. Josh is the Vice President of Data and Analytics at GE Digital. And, we're gonna talk a little bit more about that because I'm really excited to see where you're coming from on that piece. But, you're also a professor of astronomy at Berkeley University. I know when we were talking to each other, that was not the combination I was expecting. So, as we always do, I love to get your story. So, we love to find out more about you, and why you ended up where you are. So, I know a little bit of the details with Harvard and Cambridge and Caltech, but take us one level deeper, why did you end up where you are?

Josh: So, I'm sure like a lot of your guests, it sounds in the end, linear, but it was-

Ben: It never is.

Josh: Yeah, it never is. And, it was certainly a winding path. I was trained in physics and astrophysics, started at Harvard as an undergrad, and then went to Cambridge University to do a masters called an MPhil, and then went to Caltech for a PhD. And, all of that was astronomy. So, I did my PhD and I was trying a domain called gamma ray bursts. And, what I wound up doing is I went back to Harvard to do a postdoc. I started getting interested, not just in the objects that we were studying, but in the ways in which we were studying them. And so, from an astronomer's perspective, it's all about what kind of data can you acquire and how can that data be used to do inference. It turns out the data that I was acquiring wasn't just the traditional astronomy data, like images and spectra, but time series, and I wound up getting pretty interested in what it meant to do inference on time series.

I started my faculty job at Berkeley in 2005, and while I was still heavily involved in the scientific domains, got more and more interested in what, I think, then was called informatics. And, as we wind up getting more data and I started thinking about what was going to happen in the astronomy world when we had 10x, 100x, 1,000x more data, I got pretty scared at some level. But, I also got pretty excited because it felt like there was some intellectual white space that me and my colleagues were not yet grabbing onto. And, I think like all good astronomers throughout history, they look around for other tools that other people are using. The famous example of course is Galileo instead of taking a telescope and pointing at the horizon because it had been invented for military purposes, to look for ships coming over the horizon. He just said, what if I just pointed it up?

Ben: I didn't know that.

Josh: Yeah. And so, major discoveries happen when people figure out ways to cross over domains. And so, astronomers are pretty good at pooling tools and toolkits and approaches from other places. And, I started around 2006, 2007, thinking about how machine learning could be useful in my own world, and in particular learning how to scale out of some of the problems that I saw coming down the pike. And, that was really the genesis of how I got into data science more broadly, how we wound up starting the company that I started. And, how that company ultimately got acquired by GE. It was all about the sort of intellectual curiosity of what are the other tools out there that I can use. And, that took me to worlds unknown and took me to some of the fun places we've been in now.

Ben: It's interesting in hearing you describe your story that way because I think now I'd have to go look back, but it's gotta be at least four or five people I've interviewed, one of them I haven't published yet, where they're all from a physics background but they ended up in data science. Because I even think with your colleague, Dr. Muller, and there's a couple of other people I've talked to that physics seems like actually a very good background for thinking about how to deal with these things. Particularly, if you're kind of on the experimental side because you have to learn data gathering. I mean, I didn't tell you this before, but I actually did graduate work in physics too, and it actually what got me into computer science-

Josh: There you go.

Ben: ... because I ended up really enjoying that. But, it's interesting. Does that make sense to you? Because it does seem like that approach, what you're saying is that the work you did in astronomy and the way you had to think about data and the way you had to think about data gathering and working with that data, actually was a really good preparation for the other work you were going to do.

Josh: Yeah, I mean certainly, from a training perspective, having a physics background, or I'd say any sort of more broadly training and physical sciences, it helps you think about the ways in which you tackle problems from a first principle's perspective. So, what are the irreducible components of this problem that I'm looking at? And, how do I attack them individually? Can they be attacked individually, or is it such a complex issue that you have to attack them holistically? So, having some of the, I guess, sharp elbows you'd call it, and recognizing that are problems that just seem too complex, and they're probably too hard to tackle with what we know how to do today, and other problems that I can turn into various different components, discretetize them and then chain the results back together, and you get something that in the aggregate looks like it could actually be useful and interesting.

I'd say that's kind of the training side of it. But, the other side of it, when you think about data science, and you think about the machine learning in the world that we're in, in the industrial internet of things, if you want to call it that, the data that's coming off of those machines, while they aren't perfect exemplars of that physical object, they are pretty good proxies for what's happening with that physical object. And so, in the end when we're trying to do inference on data streams coming off of wind farms, those things are still physical beings. Right? And, if we had perfect knowledge of the physics of those objects, we'd be able to model them in traditional ways, finite element analysis, go back to those principles. But, if you take a fully data driven approach, the idea that you can go back and figure out when something's going to fail in the future with your knowledge of physics, yes, but then also by just following where the data takes you with these very exciting, sophisticated approaches, that's very deeply satisfying.

Not to say anything negative about those that are building and modeling people on the consumer side of things. But, if I think about the complexity of what it takes to make a decision, right? Just a recommendation engine in Amazon, for instance, you know that you're right because you've got a metric of did people click or not, and did you make money, but you don't know that you're right, as you got into the neurons of the person and you actually understood what was going on in their head. I love the idea that there's a whole realm of data inference that we can be doing that attaches back to physical objects that we can touch and we can open up and interrogate.

Ben: So yes, let's take a step back into that. So, you're studying astronomy, and so I guess you were at Berkeley when you cofounded Wise. So, tell us a little bit more about that. Why did you do it? What problem were you trying to solve?

Josh: I'd like to say that we had big problems that we're trying to solve, but the honest truth is that what I saw was the raw materials, and I saw sort of an opening in time. What we had done, if we fast forward from the sort of 2007, 2008 time, when I started thinking about machine learning, to when we founded the company in 2012 is I got essentially a number of grants from national funding bodies to hire people who knew very little about astronomy, but knew a lot about statistics and machine learning and software engineering. And, together we built this team that wound up doing essentially real-time inference on data coming off of telescopes, and were able to enable a whole sort of series of discoveries that wouldn't have been possible, not just because we were digging in the noise, and it's hard for people to do that, but just the scale at which we actually wound up having to operate was so much bigger than what a pool of Grad students could do and look at that data as fast as a machine could.

So, we were looking for places where they're experts in a real-time data loop making decisions, and around 2012 it became clear that it wasn't just in astronomy, that that sort of notion existed, that there were people that could be aided in their decision making with machine learning. So, we started a company with a, hey, we've got a really interesting team of people. Essentially my whole research team quit to start the company, and we didn't quite know where we're heading. At the time, we thought that what we should be building are tools and toolkits for people like us. Those sort of data savvy, maybe even in physics savvy, folks who needed better tools. At the time, the world of Spark didn't exist, TensorFlow wasn't a real word, and deep learning actually had not yet come back into favor. So, we were using a bunch of different algorithms thinking about ways to innovate on those algorithms, and started a company kind of with an open idea about where we would go.

So, what we recognized as a team is that we had kind of all the raw materials. We had the people who worked across different disciplines. We knew we knew how to build things together. We knew how to put machine learning into production, which seems obvious and it seems like it's kind of table stakes, but at the time, and I'd argue still, even today, there is a pretty wide gulf from people working at the algorithmic level and in the academic world to those who are actually putting things into practice. And, at the time, we were seeing people making new algorithms and then trying old data sets, and saying my scaling curve on this algorithm is epsilon better than your scaling curve. I'll write a paper and maybe get a PhD. I think that's obviously all important work. But, going from that to building a robust secure system that could be used by tens, hundreds, thousands, millions of users-

Ben: Yes, absolutely.

Josh: ... is an incredible jump and it requires a whole bunch of knowledge of software engineering and bolting on the algorithms, in some sense, is besides the point. What we wound up coming to realize is that to succeed in this world, to bring machine learning in some sense to the masses, somehow we needed to pick a vertical. We needed to pick a specific set of use cases with a targeted set of users and a very clear buyer in mind that would benefit from the AI that we had inside.

Ben: Sounds like something from the startup handbook or something.

Josh: It is, and that's why I catch myself and sometimes that sounds like-

Ben: No, it was the right thing to do.

Josh: ... I'm the VC on the other side of the table now. It's like your parents were actually right. So, what we wound up realizing is, again, something that I think is, at least to us, is obvious in retrospect is that if you're leading with AI, if you're leading with ML, in an application or a solution, you've already lost because A, that's not why people buy things and why people use things, and B, even worse, that sounds subversive to the people that you're actually trying to aid.

So, what we wound up doing in this company that we started called Wise.io was build a set of applications on top of Salesforce and Zendesk to help with customer support. And, what we wound up seeing there is a little bit like what we're seeing on the astronomy side, where you had deep domain experts who knew, in this case, a product line incredibly well, and were the front lines of a company succeeding or failing based on their interactions with customers. And, being inundated with more and more data that they didn't know how to prioritize. And, what we wound up doing is building an AI system under the hood that would read all their past interactions and essentially make suggestions of how in a new interaction, what steps they could take, or if the machine was confident enough in the types of answers, it would actually automatically answer those interactions.

And, we did this all in a text based way and that turned out to gain quite a lot of traction. And, we started working with some very large web scale companies where we started exercising this set of muscles, which felt very new to us of how do you actually build something that scales across, not just now multiple customers for a single client of ours, but now multiple clients in different industries. In the gaming industry, people use three words saying essentially, where my coins? That's the customer support kind of interaction. And then, in other industries, people write long paragraphs. So, being able to build an adaptive system that could automatically learn as new data comes in, and work across all of these different types of customer support worlds was a fantastic challenge.

Where I think GE got excited about us was because at the time that we had met, a large number of companies that had been VC funded like us, we're really still talking about tools and toolkits. Essentially, here's an enablement platform for somebody else in your team to go and do something, and those are obviously completely needed and completely important, and you can't live in a data science world now without using tools and toolkits, but we found the value of going vertical and solving a real problem was something that GE you got excited about. Because they said, wait, if you can do this in this world and you're all astronomers and physicists by training, maybe you can start attaching onto some of the problems that we have. And, we got extremely excited about the challenges ahead.

What we wound up seeing, sort of cutting a little bit to the chase after the acquisition, is that turns out the workflows that we get to work with are not so different from the customer support workflows. There's incoming information. A decision needs to be made with some level of decision support that is well tested and well verified, and then a person generally will wind up taking action on the set of recommendations. That workflow, I'd say is almost universal when it comes to the human interfacing with machines. Machines are spinning off lots of data. There are potential problems with those machines. You want to do preventative maintenance. The machine is giving you signs that it needs to be looked at by a real person. There is a real cost of being wrong in both directions. If you don't catch those warning signals early, it costs more money later on. If you catch too many warnings that turned out not to be real, you're spending a lot of people resources, pulling machines offline.

So, what was exciting to us, and really came to fruition just over the last couple of years, is this notion that what we actually wind up building, of course, we built a framework for ourselves and we built a sort of templatized model of how we build out new applications. It turns out those applications that we were building initially as an independent company are actually very, very similar to the types of ones that we build out inside of GE.

Ben: It's interesting the way you describe it, because I've used the analogy in other place, you can tell me how this makes sense to you? There's a couple of different plays I read. Have you ever read the Foundation series by Asimov?

Josh: I've read one, I think. I'm not Sci-Fi fan, I have to admit.

Ben: That's okay. It's okay. I'll give you the gist, but the thing that is out of that, that I thought it was interesting where you said, is there's always this, I think it was kind of a Asimovian thing, too, if that that's a word. This thing between AI being independent and is doing its own thing versus assisting humans. And so, it sounds like a lot of what you're talking about is less like, I don't know, like HAL9000. I'll shut you out if you piss me off. Or, more like Ironman suit. It's like I'm going to take the operators of this equipment, I'm going to take the people interacting with it, and I'm going to make them more effective and I'm going to help their decision making. Am I understanding that right?

Josh: Yeah, you got it. And, it's not just helping them make decisions that they were already making and doing it a little bit faster and at scale. It's making better decisions because now the machine is able to look at more data than a human ever could in the time that they have to make that decision. And then, interestingly also start building in decision support that's, I'd say, defensible and auditable. And so, not that all industries are this way, but one of the great promises of machine learning in general I think is that it is a decision that you can go back and understand where it came from.

Unlike a person who says, I did this thing because I felt that way at that moment and if I saw the same data again, I'd do something else. The repeatability of it and the fact that you can actually go back and see why did I make this decision and interpretability is huge, and I don't think that's been fully appreciated yet within, not just the industrial internet of things, but even at the consumer world. Being able to go back and understand those decisions is crucial.

Now, one of the things that have come up in the news recently, obviously, is because we have that power to go back and understand why, we're also starting to uncover the fact that there are deep biases that wind up getting built into some of these models.

Ben: I was going to ask you that. You did a perfect transition.

Josh: Okay, good. Well, I could read it on your face actually. You were going to ask me about about bias. Now, bias obviously is extremely dangerous in the consumer world and the person-facing world. Essentially, when you model data from the past, you're building in all of the decisions that were made either consciously or unconsciously. And, that's a whole world that I think needs to get looked at. Not just from a regulatory perspective, not just from a data science ethics perspective of how we teach students, how we train our teams, what the cultures are of teams.

But, it has to get looked at algorithmically. And, I think one of the exciting frontiers of machine learning these days is understanding at the deep theoretical level, how we can understand bias, quantify bias, and ultimately protect against that. Now, in the industrial internet of things, the bias is not really about a decision made about a person that affects them because of who that person is and what their race is or gender, et cetera. Now, we're talking about a different type of bias where the data that's come in is only a small, or not fully representative, subsample of all the types of data and interactions that could have happened with that machine in the past, right?

We may not have captured data when a machine was in a bad state because we turned off all the sensors because we were fixing it. We may not have seen all the failure modes of the machines. And so, when we try to make a sort of preventative maintenance decision about whether a machine needs to come offline and get fixed, we're basically doing that somewhat blinded by the past because we've only seen the ways in which machines have failed. And so, there's sort of a different kind of bias where the data is only giving us a limited view on what these machines actually could do. And, machines are like, it's sort of like the beginning of Anna Karenina. Like, when they're working, they're all happy and they're all good. When they're bad, they go bad in different ways, right? And, if you learn from the past on that, it becomes a little bit dangerous.

Now, this gets back to sort of the earlier conversation about the importance of physics. Where I think some of the great intellectual white space is, and what we're working on within GE is bridging the gap from the fully data driven model, which can at best only learn from all of the data you've ever collected by definition, and the physics driven model, which says we're the people that built these machines, we're the people that have the blueprints on paper and we can turn that into some physical model and make predictions about failure modes that have never been realized in the world. But, we know that if we shake it this way in a modeling or simulation system, it's going to look like this in the end.

So, the marriage of the physics driven models, which feels somewhat antiquated, but is deeply ingrained in the way in which the industrial world actually works and thinks about their machines, and the data driven model world, which is essentially entirely new or it's just starting to come up. Bringing those two together, in principle, you could leverage the best of both of them. And so, it's in that interface that I have sort of the biggest excitement going forward.

Ben: That's really interesting. One thing that you said, maybe to dig into a little bit particularly, I'm not as familiar with the area and I'm sure many people listening are not is that ... so, talk to me a little bit more about the business, how you guys actually do this. So, I understand right, you essentially are providing, an industrial customer buys a GE device. Probably, that's not the right word for it. A plane engine, a jet engine, or something like that.

Josh: That's a good [inaudible 00:22:00]

Ben: So, basically what you guys are doing is providing insight for that customer that's using a device. It's like, oh, well we saw that there might, if you don't do X, Y and Z maintenance, this could happen, or you need to do this preventative maintenance. Is it that kind of thing. I mean, what are you actually in practical terms, what is it actually doing?

Josh: Yeah, that's exactly right. And, the way I think about GE from a hardware perspective is, if it spins and costs a lot of money and your life depends upon it, at some point in your day, GE probably made it. So, it's jet engines, it's entire power systems. Obviously GE is deeply embedded in healthcare, so think of MRI machines, et cetera. So, indeed these machines are highly complex. When GE sells a jet engine to a company, we're also selling some level of assurance about the quality of that object going forward over decades time. This is not an iPhone where in two years from now you throw the thing out and you buy a new one, right? These have to live for a substantial amount of time and they always need maintenance.

What GE has is a number of kind of monitoring services that look at the health and quality of something like a jet engine. And, we'll give feedback to the parent companies like Delta or United or Southwest saying, here are our insights about the engines that are flying. And, ultimately it's up to them to sort of make decisions about what they want to do for maintenance perspective. But, that's exactly what we do. And so, if you go back to the customer support case and you think about what does it mean to be a customer support agent. Now, change the name of that person to somebody who's a deep expert in understanding the health of, say, jet engines, they are looking at acquiring and making decisions about what to do next with an engine.

The great thing about GE jet engines is that there are very, very few actual big failures. Most of the time they just need some tuning. So, the failure modes, going back to some of the other parts of the conversation, don't often show up. We're able to learn on the ones that have, but most of the time people are saying it looks good, it looks good, looks good, looks good. And, our job as we've wound up building an application internally within GE and with their aviation partners, is to build a system that allows the experts to focus on the hard problems. So, something that obviously needs to be looked at, go for it, right? We'll just help them automate that, or we'll help them at least make a decision more quickly.

If something is doing just fine and it's alert that just pops up, not quite like a check engine alert, but something that just pops up and goes away. Our machine will have learned that in those two extreme examples, you're fine with either putting it away, or you're fine with basically saying we need to work on that. But, it's that kind of gray area in between where we need to produce better decision support. Not just for people looking at jet engines, but people that are monitoring wind farms and helping them make decisions and making their job better and faster is really the goal of what we do.

Now, the critical part of it, which we haven't talked about yet, which makes from a systems engineering perspective, this whole problem really interesting, is that it's not just a set of decisions or set of actions or recommended actions that we wind up presenting. We also wind up gathering feedback into what was actually done. So, did somebody take the actions that we recommended, and what was the ultimate result of the object that we were opining on? And, it's that feedback loop that becomes really important for machine learning because otherwise if you build a model, deploy it to the world, and you're not constantly retraining, you're not really kind of keeping up with what is our current modern understanding of this set of systems.

So, what we've had to do over time is learn how to build feedback mechanisms into the systems that we wind up deploying. And, that again sounds sort of obvious and easy, but it has some real interesting implications for the way in which our system winds up interacting with people. The human computer interface when it comes to providing machine learning feedback is also something that is still, I'd say, a hot topic. It's still being worked on, right?

You see this happening even within Google Gmail all the time. You are able to give subtle feedback to the Gmail system writ large by saying, no, that actually wasn't spam or that's actually interesting to me. Or, now when they're making recommendations about the first line that you can write. If you actually accept that suggestion of how you're going to write back to somebody, that sort of a thumbs up internally for the model. If you choose something completely different, that's also training data for the model. And, any good modern software company and GE also happens to be a hardware company as well, needs to come up with mechanisms to capture that sort of feedback data. And, it's not a happy face, smiley face, unhappy face at the very end of an interaction saying, how was my interaction? No one wants to fill out those surveys. Right? It's more of the implicit feedback that you need to figure out ways in which to capture.

Ben: So, I'm thinking with consumer, a lot of times you're having to do this, you're kind of having to observe their behavior, and like you said, nobody wants to fill out ... I literally was just on a website yesterday where it's like, do you want to fill out a survey? I'm like, no.

Josh: No, I do not.

Ben: But, then thinking in these industrial situations where you've got these highly qualified engineers, or people that are doing service on the jet engine or whatever it is, is this more about taking what they would already do, like they might already put down into some sort of system, I did X, Y, and Z and then this was the end result, and you have to translate that? Or, is this actually about having to get them to interact with your system because they might not have done that other ways? Like, I fixed it, check, it was fine, we moved on. You see what I'm saying? What's the difficulty there? Because this is their job. It's not like consumers that don't really want to do it anyway. So, what's kind of the human element there a little bit?

Josh: I mean, yes, you can always deploy a new system and ask people, here's your new workflow, right? We will train you for the next X days or weeks and you will do this new workflow. Nobody wants to do that. And, that's a prime example of how a AI system that's deployed, that checks all the boxes for safety, reliability, robustness, go down the list. That's how it will get rejected. Right? Because ultimately those are the people who's on the front lines, making important decisions. That's how they are paid. That's what they're trained to do. If you come in and change their workflow, it won't work. Right? And, what we wind up learning is that you have to get to the point to become a trusted assistant where if we go down for whatever reason, or we turn ourselves off, maybe in an A/B testing type of environment, they're upset because they no longer have the new information that they need to make their lives better and to do better work.

So, that's really one of the big tricks from a deployment perspective is building applications and building UI and UX that stays out of the way of the existing workflow, but lives alongside it and eventually become sort of a trusted companion to that. Which is why you can't deploy an AI system in a production, and have it work in the final state that you want it to work in. You need to think about some sort of dialing up of that kind of interaction. So, at first it could be working completely passively in the background, just effectively observing what people are doing, making no obvious changes to their world. Eventually, show up with a little banner saying, and not saying it's like clippy, but you know a little bit like, hey, I see you're doing this, but you may want to do this. And, if it's done in a nonintrusive way, it becomes accepted and becomes sort of a partner in the decision making process.

Ben: [inaudible 00:30:01] like Ironman, because this Jarvis thing. You know, I think you should do this.

Josh: There you go. But again, it's a process. It is not something that just can be deployed and then you walk away from. It has to be very closely handheld and that ramp up from zero to 60 will take, it could take three seconds, but more likely it'll take three months to three years depending upon the complexity of the problem, and the reality of the costs of the problem, right? If you're making really big life important decisions on an hourly basis a couple times of day, the onus for being wrong on that from an AI perspective is extremely high. If you're making hundreds of thousands of little decisions every day as a person, by all means take off 99 percent of that and I'll be happy.

Ben: Yeah, it is really interesting because having worked, a lot of my life, on the kind of software side, it's very ... because mistakes is like, oh no, they didn't get the buy their new pair of pants. The world will go on. But, if it's like the plane could crash, or millions and millions of dollars investment in some sort of system could be wasted because it's a completely different playing field in that.

Josh: Yeah. I mean, if you take a power system offline, tens of thousands of people lose power for hours. That's millions of dollars of lost revenue. That's a very bad mistake if it turns out you didn't need to do it. The other side of that coin of the other very bad mistake of not applying a maintenance job to something that winds up leading to some catastrophic error. That's also pretty bad. And, again it comes back to the quality of your modeling. The data that you have is the input to generate those models. And, then whatever sort of boundary conditions you can put on what the model's wind up eventually admitting based on heuristics like, you shouldn't say that this is going to happen because it's never happened before. Or, more specifically, producing boundary conditions that are constrained by physics, right? The idea that a jet engine is going to go at 45 times the speed of sound is zero. And, there are boundary conditions that we know because we have these a priori understandings of these physical systems.

Ben: And, you've created this system so you have that deep insight. This is fascinating. I don't know about anybody else listening, but I'm definitely learning something so it's good. So, kind of wrapping a bit of a bow on this. So, you clearly picked some hard problems, and there's a lot of really interesting stuff to be done here. So, one thing I usually ask most of my guests is, so well, you're working on this right now. What are you thinking about working on, or your team working on kind of over the next year to three years that you don't think other people are thinking about? What are you thinking about that is not really kind of rising up the surface to for other people to think about?

Josh: Well, I'd say one of the things that I'm extremely excited about is this idea of doing machine learning on private data. And, I know some of your listeners in the past have heard from your guests about data privacy and data ethics. I'm thinking about a specific use case where we've got customer A and customer B. They're both using a similar product suite, but they're competitors. And yet, we need to build machine learning models that wind up improving the experience of those two customers through these products. Again, in the consumer world, if you're Amazon, you just build a model with everything because you own all the data. In the world that we work in, which I really think of it as what kind of industrial machine learning can we build, as opposed to consumer based machine learning where there's sort of different rules around the data.

We work in a world where it may be that we don't actually get to own the data that our machines actually produce. Maybe we get a look at it, but we can't just continually build models off of that. Moreover, no one wants to share data with their competitors. So, how can we build machine learning models where both sides, or multiple parties can wind up benefiting, but where there is no data leakage. We can provably show that no data was transferred from customer A to customer B, but we potentially can't even see the data that we build models on, or at least can't see the unencrypted version of that.

Ben: Kind of have a blindfolded algorithm in some sense.

Josh: Exactly. And, there's some great research that's happening in places like Berkeley where they're starting to think about how do you do machine learning on private data and produce privacy guarantees around that data. Now, there's great implications on the consumer side, but when it comes to industrial machine learning, not having all the data in one place becomes a big challenge. Producing a system that can be sort of a trusted component to the machines that we wind up deploying into the wild, that our customers can wind up believing and understanding deeply that there's no way that data is going to leak it out. Yet, we can make good use of it for the benefit of not just them, but everyone else is a fantastic problem. And, we certainly don't know how to solve that yet and in part we have some use cases in mind, but we also need some buy in from a lot of different constituencies to actually start trying to release this into the wild.

Ben: That sounds pretty exciting. I'm excited to see what you guys come up with. Well, I mean with that, Josh, thank you so much for coming on the podcast. I know I learned a lot. I'm really excited to see what you guys do, and it's really cool to see how you're applying to such a different world with GE. So, thanks for taking the time.

Josh: Thanks for having me.

Ben: And, thanks everybody for listening to the Masters of Data podcast. Check us out on iTunes. Rate us so other people can find this. Find us on your favorite podcast platform, and thank you for listening.

Speaker 3: Masters of Data is brought to you by Sumo Logic. Sumo Logic is a cloud native machine data analytics platform, delivering real time continuous intelligence as a service to build, run, and secure modern applications. Sumo Logic empowers the people who power modern business. For more information, go to sumologic.com. For more on Masters of Data, go to mastersofdata.com and subscribe, and spread the word by rating us on iTunes or your favorite podcast app.

The guy behind the mic

Ben Newton

Ben Newton

Ben is a veteran of the IT Operations market, with a two decade career across large and small companies like Loudcloud, BladeLogic, Northrop Grumman, EDS, and BMC. Ben got to do DevOps before DevOps was cool, working with government agencies and major commercial brands to be more agile and move faster. More recently, Ben spent 5 years in product management at Sumo Logic, and is now running product marketing for Operations Analytics at Sumo Logic. His latest project, Masters of Data, has let him combine his love of podcasts and music with his love of good conversations.

More posts by Ben Newton.

Listen anytime, anywhere

Available to stream or download via these and other prodcast apps.