2018 Financial Markets Conference—Policy Session 1: How Do Machines Learn Finance?
Is machine learning (ML) a fundamentally new set of tools? Or does it represent an accelerated innovation of tools already long in use? What are the major strengths, weaknesses, and limitations of ML? This session examines how human experts and decision makers interact with ML.
Raphael Bostic: I wanted to really make sure that today we have some really interesting conversation. In my experience, the best conferences are ones where the participants are actively participating, so I really do encourage you all to step up and raise your hands and raise questions. I know from yesterday's discussion at our table, we got into it pretty good. We went to places, and I learned some things that were really quite interesting. So I hope we continue with that through the rest of today.
Now the theme, as you know, is "Machines Learning Finance," and the overarching question is, how will this technology change the game? What's going to be different, and how are we going to have to reposition our activities and our thinking to make sure that we're best positioned to take advantage of the changes that are to come? There are lots of questions embedded in this—you heard a bunch of them yesterday. But ultimately, what we're hoping for is that this is the beginning of a conversation that you have with your teams and your colleagues over the next months and years, because this stuff is going to be here for a while.
Now, we're going to try to walk you through the course of the next two days to make sure that everyone gets to a place where they feel that they're conversant. So the very first panel is going to focus on the basic question, what is machine learning? Let's just set a baseline understanding so that as we talk about it, understand its origins and its foundations. I have some ideas, but I don't know it as robustly as I probably should, so I'm really looking forward to that.
After a break, we're going to then turn to the regulatory environment, and really have a conversation about what does machine learning mean for regulation, for compliance, for risk management, and all the things that we worry about at the Fed from a bank supervision perspective? Then we'll have lunch, and after that, we will listen to people talk about artificial intelligence and the modern productivity paradox, and the tension between expectations and statistics. I will confess, I have no idea what that means, so I'm really looking forward to that.
Then we'll close today with a policy session called "Learning about a Machine Learning-Driven Economy." We talked a little bit yesterday about the implications of machine learning for economic performance. This will round out that conversation, and it will be, I think, instructive for those of us trying to do policy to have an idea about what it means for monetary policy.
Tonight, we'll have a reception and then we'll have a keynote address from John Havens. John is the executive director of the IEEE Global Initiative on Ethics of Autonomous and Intelligence Systems. Embedded in a lot of the questions yesterday was the notion of the ethics of technology, the ethics of machine learning and AI. This is something that we need to be talking about because there are implications for—the decisions we make in the ethics space will have real implications for the types of approaches we do to programming machines and setting the boundaries for what can be acceptable. This is going to be a great day, and I'm really looking forward to it. I, again, encourage all of you to engage and be active. Now it's time for the show.
But before we get to that, I wanted to call—Brian, are you coming up? Because we want to make sure you are well-versed in Pigeonhole, which is a technology that we use to aggregate questions and start the conversation once the presentation is done. So, Brian, it's all yours. Thank you.
Brian Robertson: One of the hallmarks of the Financial Markets Conference is the conversations that we have over the course of the next few days, not just discussions that we have on stage. So for this year, we're inviting everybody to join that conversation online with #FedFMC. Between today and tomorrow, we'll be posing questions, sharing our insights, and providing some of the thoughts that our speakers are going to discuss today. However, for the discussions that are happening here, today we'll be using Pigeonhole Live for Q&A sessions. Instructions are in your conference folder. However, I'll go over briefly the basic steps. How do I do these?
Step one: You're going to want to connect to the WiFi connection with your smartphone, tablet, or laptop computer. You'll then want to do one of two things: either launch your internet browser and go to Pigeonhole.at and enter your event passcode, FedFMC, or, if you've downloaded our app, you can access Pigeonhole directly through there. We strongly encourage you to use our app. In addition to being able to access Pigeonhole Live directly through there, we have the program agendas along with the presentations and other conference material, as well as speaker biographies and the attendee list. Once you're done, click Go, and you're free to fire away all of your burning questions while also voting for any question that interests you within 200 characters. Voting may get the moderator's attention—however, please note the moderator has the discretion to select audience questions of his or her choosing, or pose one themselves. We'll also have polls during the breaks to quickly gauge your reactions, thoughts after a session.
Lastly, the Atlanta Fed will be taking photos throughout the conference for purposes of public relations. Please smile. If anybody's having any problems connecting to any of our apps or devices, we've shipped in a team of teched-up millennials. Please grab one of us. We're out there in attendance. We'll be more than happy to get you guys squared away.
Without further ado, we'll proceed to our first policy session. Frank, you have the helm.
Frank Diebold: Great. Well, welcome everybody. Great to see everyone out bright and early in the morning. This, of course, is the session on how machines learn. I'm not sure that we will resolve that, but at least we'll frame a lot of questions and bring things into sharper focus. We've got a great panel, by the way. I may not be great, but at any rate, I am Frank Diebold, an economist and statistician at the University of Pennsylvania. Our panelists are Ryan Adams, a computer scientist from Princeton; John Cunningham, a statistician from Columbia; and Gideon Mann, head of data science at Bloomberg. We will be having them speak in that order, which is not the order listed in the program. We have meticulously optimized the ordering to make it perfect. Believe it or not, we did that without machine learning—we did it the old-fashioned way in the back of the room about 10 minutes ago.
So without further ado... And, of course nice, extensive biographies of each of the panelists are in the program. I'm not going to read them so that we have maximal time for our discussions. Each panelist will go for about a half an hour, which will leave us about a half an hour for Q&A. After they go, I might go or might not go and tee off with a few questions. We'll see what's appropriate. But at any rate, without further ado, I give you Ryan Adams.
Ryan Adams: Thanks, Frank. What I'm going to do to get things started is I'm going to talk about machine learning in very broad terms. I'm going to give you a few examples. I'm going to try to take you on a little bit of a tour of the way we think about it as researchers and practitioners. I'm not going to really talk about finance. I'm going to try to give a high-level, intuitive view of the different ways that we think about these kinds of problems and try to set expectations a little bit around what's going on right now, why people are excited, what's going on, how does ML relate to AI, and so on.
Let me get started here with this quote that's 20 years old but that I think summarizes machine learning quite well in a simple way, which is from Tom Mitchell, who's a professor at CMU [Carnegie Mellon University]. He wrote one of the original books on machine learning. Put simply, "Machine learning is the study of computer programs that will improve automatically with experience."
And "experience" here we take to be something that is very broadly defined. It might be experience within our active environment but it might also be experience in the form of a large amount of data that's been gathered through some other way. There's also some subtlety here in talking about what it means to improve on a particular problem. One thing we see is that framing a machine-learning problem is as much of the job as anything else.
I like to break things down broadly into three kinds of things that we do with machine learning, and three kinds of tasks that we're trying to solve. The classic kind of thing that we typically think of as supervised learning is the task of prediction. OK? So we have some situation—maybe you're going onto Netflix and the system there maybe would like to make a prediction about what movie you're going to like. Or maybe you want to evaluate some new drug candidate—you'd like to make a prediction about how well that drug is going to achieve the function you intend to achieve. Or maybe there's some particular macroeconomic quantity that you want to make a forward projection for.
Can we close the dialogue box, please?
That's the kind of thing we call supervised learning. Unsupervised learning is where we don't have access to some notion of ground truth. We generally think of this broadly as trying to find structure in data. Maybe that means a density model, maybe that means understanding clusters. The kinds of questions we often ask there are maybe, "What genes are relevant to some disease that I care about?" Maybe it means finding some low-dimensional representation of your data. So we would view principal component analysis as being an example of unsupervised learning. Or if you think about building density models, there's also this area we call generative modeling where given a big data set, synthesized from the new data, that somehow share the statistics of the data set that I provided to you.
Then in the third category, there's this general notion of trying to make sequential decisions over time. There's some environment. You're going to interact with it. Maybe you get some kind of reward from it for doing the right thing, and you're trying to learn to plan and make decisions. Broadly, I'm going to refer to this as reinforcement learning. Here we can think of different kinds of decisions like maybe if you're a bank, should I offer credit to this person? Or maybe if you're a medical professional, should I provide a—what treatment should I decide to provide to this person, or what sequence of treatments should I provide? Or if you're a self-driving car, maybe you're trying to make a decision about whether or not to stop at a particular intersection. So these might involve prediction, might involve identifying structure and data, but at the end of the day, it's boiling down to making some kind of particular decision.
So I think at a high level, the way to think about machine learning is in the large computer science terms, is, traditionally, when we would write a program to have a particular function, then you would sit down, look at some inputs, look at some outputs, think really hard about what the properties are of those inputs that might lead to those outputs. Maybe come up with some rules and codify those rules. Machine learning—the idea is to provide examples in the form of input/output pairs and then automatically produce a program that produces those outputs given those inputs.
So let me start off by talking about that particular view on things, and supervised learning, and give you a few different ways that we can view this automatically learned input/output mapping as a learning machine. At the end of the day—and this is a somewhat controversial but, ultimately, I think, fundamentally correct statement, which is, "Supervised learning—machine learning—is really just about function approximation." OK? Function approximation that is not so different from linear regression, logistic regression, the kinds of things you might learn in Stats 101. I have some data points. Maybe there's some ground truth, maybe there's not. But I'm going to choose some basis functions, and then I'm going to perform regression. All of machine learning is some variation on that problem. I should say all of supervised learning is some variation on that problem. It's not a very glamorous or very sexy thing, but that's really what it is, folks.
The question really is, what are we willing to tackle? What are we willing to call the inputs to our regression function? That's where I think machine learning really differentiates itself in some ways from statistical regression type approaches, where it's the same tools, but machine learning researchers and practitioners tend to think more broadly about what an input to a function can be. If you wanted to learn a function that's capable of taking an image and identifying some semantic information—say it's an image of a street sign and you're trying to identify the number that's in that, say, housing number—then a good function from your regressor is going to be one that, when there's actually an "8" in there, produces the number 8 or the label 8, right?
Really, at the end of the day, this is just a function that's taking a vectorial representation of this image and building a regressor for it, given a lot of previous examples.
The thing is, this kind of function-approximation view, if we think creatively about both what the input is and what the output is, leads to a lot of different kinds of structures that you can invent and problems that you can solve using interpolation. Here's a classic example where everybody in this room has a phone on them. That phone does some kind of face recognition in the camera. Simple face recognition is a supervised-learning-type problem, where you take an image and you're trying to both identify and localize the faces in the image.
Another example: this was a video before, but you can imagine a self-driving car that's going to need to identify what parts of... what it can see are road, what parts are sky, and trees, and cars, so that it stays on the road and doesn't hit pedestrians and so on. This we can view as a supervised learning problem. It's a function that takes in an image, like on the bottom, and produces a label for every pixel in the top. So still a big function approximation, but if we can gather a lot of data like this, label a lot of data like this, and build a functional approximation.
This is another example that was formerly a video, but is a nice example of thinking creatively about what we can do with functional approximation. On the left is just a static, two-dimensional image. But what these researchers did is they went out with a laser range finder and they gathered a large amount of ground truth where they took an image and then found the depth of every structure in the image and then trained the big neural network to predict the depth of the pixels, given the two-dimensional image. So then you could give it some two-dimensional image, and it will turn it into a little 3D model that allows you to navigate at a modest way, to get a sense of how far things away. But again, it's got acute application relevant for things like potentially robot navigation—an example, again, of just framing a problem in terms of learning a function.
On the natural-language side of things, we can also think of these as supervised learning type problems. Maybe we have some text, and as part of the natural language processing system, we would like to know what the parts of speech are for every token in that sentence. So we could build a system that goes from the input, which is just a raw sentence, to identifying which things are nouns and adjectives and verbs and so on—again, a thing we can frame in terms of just building and learning a function. We can combine things like images and texts. Recently there's been interest in, for example, building automatic captioning systems. So if someone is sight impaired, you might like to be able to give them a text representation of an image as they're interacting with a website. There's active research in trying to go from images to coherent textual descriptions. Again, thinking of this as a function that you learn from data.
You go to Netflix, you go to Amazon—everyone's interacted with recommendation systems and ad systems. Again, this is a problem we like to think of as learning a function, learning a map from perhaps the two pull of you and your tastes and, say, some particular movie, to what you might rate that movie after you've watched it. A few years ago, Netflix, for example, ran a really big contest that spurred interest in recommendation systems within the machine learning community in which they offered a million-dollar prize for improving their algorithms for predicting their users' tastes.
To give you a couple of examples from my own research that don't feel anything like these classic computer vision problems, I worked with some folks at UCLA who were in biomedical engineering. Their objective was to try to come up with noninvasive cancer screening tools that could use microfluidics and computer vision in microscopes to rapidly identify which cells are cancerous and which ones aren't. Again, same kind of thing—we have some fancy data coming from biologists and coming from different kinds of sophisticated equipment, but, ultimately, we can turn this into a function approximation where we have a bunch of examples of cancerous cells, noncancerous cells, and we're trying to rapidly—we're trying to build a function approximator that gives us that kind of label, given the input.
Similarly, we worked with some folks at Harvard Medical School and at MIT, where we have a large amount of data essentially of biophysical recordings of people in the intensive care unit, and we're trying to help understand the structure of those time series data to inform potential interventions. We could view, for example, the machine learning prediction task of—given possibly complicated time series, can we predict whether someone in the ICU is going to die in the next 24 hours? Again, it's a possibly complicated input, but we're trying to turn this into a function approximation problem.
The main takeaway here is, although there is a lot of excitement, and a lot of these problems maybe feel like AI, we're producing text to give an image. Does it understand what's going on? At the end of the day, what we're really just doing is taking a bunch of pairs, taking a bunch of two pulls of inputs, possibly complicated, and outputs, also possibly complicated, and trying to figure out a way to parameterize a mapping between them, such that given all those examples, we can find a classifier or a regressor. I'm hoping a little bit that with these examples it's—even though I have talked technically about the way these things work—it’s to pull back the curtain a little bit, that all of these, no matter how complicated they might feel, boil down to interpolation, boil down to trying to learn a particular kind of regressor, and the creativity is in coming up with that structure.
I'm going to take a brief digression, though, and talk about deep learning. Who's heard of deep learning? It's like people have been caring a lot about this. It's been in the news. Acquisitions for hundreds of millions of dollars and so on. I'm here to tell you, that no matter what Elon Musk tells you, deep learning is adaptive basis function regression. OK? It's also just regression. If you're worried about regression, then by all means, worry about deep learning. I mean regression, it's a great idea, right? It does work.
Let me tell you about deep learning in just a couple of slides and this is a little bit targeted to people who are used to thinking about more traditional statistical economics models. I want to try to convince you, really briefly, that deep learning is just regression. Does anyone know what generalized linear model [GLM] is? Yeah, all right. This is a good audience. Ask that to undergrads and they're definitely not going to know what a GLM is.
A GLM is just this really basic model for mapping from a set of covariants with a linear function to some kind of observation. It might be a binary observation or [inaudible] observations or different kinds of things. It's a way to describe a likelihood. It's a particular way to map from inputs to outputs. It’s very commonly used in all kinds of simple and complicated models. At the end of the day, deep learning and regression and these kinds of things are just simple linear inner products to produce those outputs. When you've done linear regression... Who's sometimes added maybe a basis function or two, maybe some polynomials or something to that regression, maybe a little bit fewer? But sometimes, things are not linear. Linear, you need a quadratic term or something like that. That's not a controversial thing to do from a statistical point of view, right? Sometimes we need a little bit of nonlinearity and I'm going to add some [inaudible] here to describe those nonlinearities. Now what I'm doing is I'm weighting those basis functions in my regression.
Again, on totally solid statistical ground here, you need to be judicious when you choose those basis functions, but, ultimately, this is just a way to introduce some nonlinearity into my linear regression.
So it's also totally reasonable to say, "Well, I don't necessarily know exactly what polynomials I should include or what my basis should actually be when I do this regression, so I'm going to add a parametric family of basis functions. I'm going to add some parameters, not just to my linear regression but I'm going to add some parameters to my basis functions. I'm going to fit those basis functions along with my regression parameters. I'm going to do that with maximum likelihood or maximum a posteriori estimation," or whatever—least squares, or whatever your favorite inductive principle is. You're going to use that not just for the linear regression weights but also for the stuff, the parameters sitting inside the basis functions. That still sounds like a totally coherent statistical thing to do, that it would be noncontroversial in any sort of applied stats journal, econometrics. This would not be controversial. That's a neural network. That's a neural network.
If you think about... If we called those little... That second layer of circles, if we called those neurons, that would be a neural network. They're literally just smooth basis functions that take inputs and adapt their parameters in response to data typically used in maximum likelihood.
Deep neural network, you just compose maybe one or two, three, more layers, maybe 100 more layers. But the point is, my basis functions now have basis functions. That's what deep is. The appeal of that potentially is that if you want to discover high order structure, then you need to compose together a lot of simple basis functions and adapt those to the actual data at hand.
There's a lot of hype, a lot of mystery, a lot of worry around this structure, but this really is the whole story. It might be hard to optimize. It might be hard to decide what that structure should be, that directed acyclic graph that goes into this composition. But this really is what this is all about. There's reasons I'll get to at the end why this has been exciting now, but this is the main thing to realize. Part of the trick is these things get trained using a tool called back propagation, which feels like a somewhat magical procedure, which errors in the end propagate into rules deep in this big neural network.
Back propagation is an example of something called algorithmic differentiation that I just want to take a brief aside here to explain to you, to try to demystify this other... So there's this mystery of the deep neural networks, which I want to convince you is just this adaptive basis function regression, and then the mystery of back propagation, if you've heard of that. I want to explain, I want to convince you briefly that this is just the chain rule, like the high school calculus chain rule. OK, so broadly speaking, if we have some function that maybe takes in a cat and tries to produce a label, or takes an image and tries to produce a label of what's semantically in that—it's this function that we're using. It's a richly parameterized function that's the composition of many pieces. If we think about trying to find a gradient of that, of whatever our overall loss function is in terms of the guts of this thing—by the way, that's what we're going to use to optimize and fit the parameters—then that gradient is going to decompose into a product of [inaudible].
If you were doing high school calculus, then you would take these and go left to right. You would just go left to right because that's the way we like to write things down and you would multiply these matrices. When you did—and I apologize the AV isn't working for the animation here—then when you multiply those matrices left to right, then that would have a bunch of matrix multipliers, which would be cubic and how much they cost.
Back propagation and the idea of reverse mode automatic differentiation that informs all these tools like TensorFlow and PyTorch that you may have heard of in the popular press, that's just applying the chain rule right to left, instead. This is a good idea because now it's matrix vector multiply, so it's quadratic instead of cubic. Maybe it's a lot of jargon, but the takeaway is, if you remember the chain rule from high school, back propagation and the way that these things are trained really is just a particular instance of that encoded in software.
All right. That's supervised learning in the special case of deep learning. On this whirlwind tour, I want to take you a little bit now through unsupervised learning.
So unsupervised learning is where we have inputs but no outputs. We don't have any natural notion of ground truth. We have to come up with some idea for what kind of structure we're looking for in the data. I want to give you a couple of examples to give you the flavor that when I say "finding structure and data," what do I mean? Here again, I'm going to show you a couple of things that were intended to be videos but are just going to be static images instead.
One example that I like of thinking about what it means with my structure and data is these researchers at Cornell went on Flicker and downloaded a very large number of images from particular or geographic locations. What people tend to do when they go visit hot tourist spots, right, is they take a bunch of pictures and they post them on a social media site. They're all pictures of the same thing. Right? So many, many people taking pictures of Notre Dame or whatever. What they did is observe that they could find correspondences between the pictures. And that of course, they could take these pictures taken from many, many different viewpoints and use those to construct a three-dimensional model.
So this an example of what I mean when I talk about finding structure in data. You have a giant bag of images. We know that there's some underlying three-dimensional ground truth. You can come up with clever algorithms to try to recover that structure. These actually are cool fly-through videos. If during the break you want to come by my laptop, I'll show you these neat fly-throughs. But this is just meant to be an example of what I mean when I talk about finding structure.
Another example: so Microsoft runs its Xbox system that has many millions of online gamers. They want those gamers to have a really great time when they're playing. One of the ways you have a great time when you're playing an online game is that your teammates and your opponents maybe have a similar skill level to you when you play. But that means that to find that matching, that means they need to be able to estimate the skill levels of the people who are playing. You can imagine taking... If you think about the chess Elo-type system that tries to identify rankings based on who you've beaten and who you've lost to. Imagine generalizing that to team games across millions of millions of players and building a probabilistic model to support it. That's what Microsoft did in order to... I view this as an example of unsupervised probabilistic modeling, where they're looking at these millions of millions of data, millions of games have been played, and trying to recover the skills for every single player in their system so they can find the best Halo game or whatever to insert them into.
Other kinds of examples that you might've encountered: social networks or something, and biological networks, and there's lots of different structure out there that can be represented as graphs. So another kind of unsupervised learning is to try to identify, say, community structure in these graphs. I was at Twitter for a while. Twitter cares a lot about trying to guide its users to particular kinds of interest groups and social groups to try to get them the content that they might be most excited about. That means identifying coherent, possibly overlapping, communities in the data. Again, this is an example of unsupervised learning and where you're trying to find some notion of structure within the data.
Another example is what we call in machine learning, topic modeling. So if you have a large corpus of documents and you'd like to build some kind of understanding about what those documents are talking about, then you might try to boil off all of this text into some relatively small set of topics that correspond to different subsets of the vocabulary, the idea being we would like to be able to discover those topics from scratch, given the corpus, and then be able to take individual documents within it and frame those in terms of the underlying content. So this visualization shows you just a couple of different example-type topics along the top here, which are just lists of words and labels of the words within the abstract for a particular document. In my own research, we've applied unsupervised learning ideas to things like sports analytics. Here's an example of finding low-dimensional structure and shot charts for NBA players. There's a bunch of really amazing tracking data for the NBA, and you can imagine trying to identify the relatively small set of basis functions that describe the majority of the shooting patterns of different players.
You can see here corner threes, mid-range jumpers, and lots of different canonical types of shots that people take, and then you can take a player and map them into a space like, what are their loadings against these different bases?
We can also think about spatial-temporal structure as unsupervised learning-type problems. Here we took a spatial-temporal time series of different violent events taking place in the city of Chicago and trying to understand how that might relate to gang violence, both clustering it spatially as well as trying to find causal effects over time.
Shifting gears slightly, another example of unsupervised learning that I think is quite hot right now is to think about how to synthesize data, given examples. So far I've talked about trying to find structure, but here the idea is not necessarily to find structure but to somehow learn a model that's so good that it's actually able to generate new data given a bunch of examples. You could imagine taking a very large set of faces and building a model that's capable of synthesizing completely new faces, such as the ones here on the right. And you can tell some of them look pretty realistic, but some of them are clearly synthetic because they're creepy or strange or deformed.
On the left—for some reason, people did this also with a set of pictures of bedrooms. I'm not really sure why that data set exists. Maybe it was from IKEA or something. But here, these are synthetic bedrooms, and they look coherent when you squint your eyes. But then if you look close, you see that they are missing different kinds of structure like high-resolution structure.
And then another version of this is where it's more covariant-driven, so rather than synthesizing a completely new face, take a face and change some property of it such as its age and synthesize a new picture that's the same person but at a different age. This is something that I think is simultaneously exciting for like building all kinds of interesting digital content, but it's also something that, in a larger ethics and regulatory conversation, I think is really important for us to get out ahead of because it's rapidly becoming possible to create digital media to create videos of anyone saying anything that you want, and I think that has very serious implications for our democracy.
On that lovely note, I'm going to switch gears again and talk about this third leg of this little stool, which is reinforcement learning and making decisions. This is making decisions over time, playing a game with an environment that you can often think of in relatively traditional expectimax-type terms. Supervised learning is very, very successful. Unsupervised learning is kind of successful, has some interesting stories like the Microsoft one. Reinforcement learning, broadly speaking, is not successful. It has one big success, which I think we can call a reinforcement learning problem, which is this AlphaGo thing that we talked about a little bit last night.
Go is much harder than chess because it has a much larger branching factor. It's always been viewed as a holy grail for machine-learning algorithms, and DeepMind surprised everyone by building AlphaGo that was capable of doing very well. I think it's easy to overstate the AI-like algorithmic general aspects of this relative to the fact that they threw a huge amount of engineering at it, that often, you know, if you build a Manhattan Project to solve a particular problem, you can make interesting progress on that. We certainly learn some things, but none of the techniques that went into this really represented giant breakthroughs so much as very well-considered but focused effort to solve this particular problem, using ideas like Monte Carlo Tree search, not invented for Go, but turns out to be very useful for Go.
Another related thing that DeepMind has also been at the forefront of has been the idea of trying to get good at playing old-school Atari games. This sounds like a really strange thing to study. It's within this expectimax framework. The intellectual agenda here, broadly, is the idea that there are many different Atari games and humans seem to be able to play some of them and then be instantaneously good at novel ones. If you want to figure something out about AI, then you need to figure out how to build agents that are capable of getting that transfer learning. That's the idea.
It has, I think, been less interesting intellectually, ultimately, than the people who initialized this program hoped—in particular, it might take hundreds of millions of hours of game time for the agent to get good. This is deeply disappointing since humans can be good at this in a couple of hours. Also, the Atari reinforcement learning agenda has brought to light a variety of issues about how challenging it is to reproducible science in a space like this, where you need a simulator and you need to be able to set up very similar conditions the way other people have done it in the past.
I think, even though it's cute and has been interesting for a small number of papers, that there's been a proliferation of work in this that is unclear to me how it's going to generalize.
Diebold: Ryan, you should wrap it up.
Adams: Yep. Almost done. I won't go into a lot of details here, but robotics is also a classic example of where people like to think about decisions over time. Again, reinforcement learning has been interesting, and I think it's accelerated in large part due to the wide availability of new simulators for tackling these kinds of problems, but ultimately for low-level tasks, it's unclear if that reinforcement learning has had much to say relative to classic control-theory-type methods.
One area that I think reinforcement learning and sequential decision making will be more valuable going forward is thinking about how to automate the process of design. I won't go into detail here, but we've had some success in, for example, trying to identify new molecules and for, say, organic LEDs, photovoltaics, where you can think of the process of deciding what new molecule to build then task as the sequential decision-making task.
To wrap up here, I want to talk just a little bit about the hype and try to ground things a little bit. It feels like I talked about things, probably, that are trajectories over decades that look a lot like statistics, function approximation, and maybe that's been a downer. What's going on right now that gets all these people so wound up? Why are people investing in this so much? I think one thing to recognize is that many tech companies already had a lot of their revenue built around machine learning technology, companies like Google or Facebook. When you interact through clicks online, you're already interacting with ML systems all the time.
I think companies have realized that there's lots more potential, many more places of potential use of that technology, and have doubled down on it, but also, computers get faster over time. The difference between deep learning now and deep learning in 1992: there's a few technical advances, but at the end of the day, we do have much, much faster computers and many more data than we had before, as well as better tooling for automatic differentiation, which I mentioned before.
To wrap up, I want to define, broadly, two terms that are important for conversations around artificial intelligence. When we talk about the worry of artificial intelligence, a lot of the time people are talking about what we might think of as strong AI. Strong AI is some ambiguous notion of machine intelligence, in which the machine is able to do whatever a human can do. It's defined in this very anthropocentric way. This is not on the horizon, I think, in any real sense. It requires technical advances that... We don't even know what we don't know about how to build something like this.
Deep learning is exciting, but it does not uniquely enable this in any interesting way, in my opinion. What is happening is we're getting better at building interesting weak AIs. Weak AIs are systems that are good at tackling particular problems. You have some particular tasks—you want to play Go, you want to drive a car, you want to recognize cats, you want to predict who's going to click on this ad. These kinds of things can exceed your performance, give a lot of data. In particular, if you sink months or years into engineering time to solving these problems, then they can add a huge amount of value. But they are not general. Alpha Go is not going to do your laundry any time soon.
These are really economically, socially important, are going to impact our lives in lots and lots of different ways that we need to think hard about, but the fact that that's true does not imply that strong AI is on the horizon. In my view, I think it would be helpful to not frame any of our ethics discussions and things like that in terms of Skynet-type ideas, but in terms of the very real impacts that automating systems have on everyday interactions. I'll stop there since I think I'm well out of time.
Diebold: Thank you very much, Ryan. Some interesting questions are coming in. Don't be shy if you have a question but haven't posted it yet. We'll do all the questions at the end to make sure that we have enough time for a vibrant Q&A. I know we were talking about this, and if it works and you're OK with 25 minutes or even 20, we can do that to make sure we have time left at the end.
Next, then, we got John Cunningham from Columbia.
John Cunningham: Could you full-screen that please? Excellent.
Thank you very much. Good morning. I quite like how Ryan took you through a lot of the technical content, and if you like demystifying the hype around AI versus the real opportunity around these targeted technologies. My job now, for the next 20 minutes, is to play a similar demystification role and talk about how that is impacting the enterprise. I'll do that in a couple of steps.
First, I want to contextualize what I mean by AI machine learning in the enterprise. I then want to go in and talk about AI from a technical landscape, a technical ecosystem perspective, so we can see where the technology is and where it isn't. Then I want to talk about how it creates enterprise value and its impact on the cost structure of companies.
First, I am a professor at Columbia. I work at statistics and machine learning. From an outsider's perspective, my work in machine learning algorithms looks very similar to the work that Ryan does. I also spend a lot of time in applications, particularly in biosciences and neurosciences, so I think a lot about the interplay of artificial and biological intelligence and where that's going.
I also spend a great deal of time as an adviser to companies who are trying to frame their business around AI or trying to, in a more traditional setting, exploit an opportunity for AI in their core value chain. I also work with a number of investors as well, and I have some industry experience in that.
Why do I mention this? I mention this to clarify my orientation and the orientation of this talk, which will be very similar to the pragmatic engineering focus that Ryan had in his previous talk, which is... I view AI and machine learning, currently, as a shovel-ready targeting technology that will grow over time and have tremendous impact on the economy in a number of sectors. That's very different from discussions of the singularity or AI futurism, which is an important conversation to have in its own right, but I'm going to focus on the mechanics.
With that, I mentioned before that I'm particularly interested in the impact that AI can have on enterprise value creation. That's a little bit small. There's a lot of hype around AI right now, right? This is just a mention of AI and machine learning in earnings calls over years, and you can see that that's exploded in the last few years. You can also see that there's been a flurry of investment and M&A [mergers and acquisitions] activity. This is something that we've all seen, and that leads us to pretty speculative notions about where this is going.
Again, to defuse that hype a little bit, I want to clarify that even the most conservative views of the economic impact that AI and machine learning can have put this in hundreds of billions to trillions of dollars over the course of the coming decade. There's a number of sources you can read about that.
Now, what I want to try to do is talk about the technological cycle that we're going through that is driven in large part by AI and clarify that this really smells a lot like the ‘90s, the last tech cycle that we went through, both in terms of the fundamental value and the ecosystem that it will create, but also in terms of the ambitious forecasts that are centered around that. Those interested in labor productivity can find a lot of reading about that: 50 to 150 basis points expected in the US over the coming five year, the creation of two to three million jobs, the destruction of two million jobs, enterprise value creation in the trillions, revenue increases per sector of $50 billion, costs reductions of a similar scale.
This is often focused around retail. I think retail is a particularly good example where we've seen this in spades already, [and] finance which is ahead of the curve, [as well as] obviously technology, energy, agriculture, the resource and mining space, etc.
Now, obviously, there are serious costs, ethical issues, and regulatory issues that we need to consider as this technological cycle evolves, whether that is monopolization and consolidation of particular industries, whether that's displacement of the workforce, whether that's privacy concerns, something that's become particularly focused on in the last few months.
What I want to talk about, where I want to orient the goal for today is to understand the AI ecosystem: how it exists today in industry, and how enterprises can capture this. AI is a technology and an ecosystem. We've heard a lot about the thinking about machine learning and thinking about AI as fundamentally a set of regression tools instead of unsupervised learning tools. How does that bubble up at the level of a corporation and available technology?
First of all, in terms of the ecosystem and the technical landscape right now, there's quite a bit of consolidation, right? We hear a lot about the FANG and the BAT—the AI majors. And I do really want to emphasize the extent to which a lot of current AI talent, from an HR perspective and also algorithmical and technological expertise, is housed within these AI majors. That has led to a surprisingly early consolidation in the market.
There is a very strong AI entrepreneurship landscape. Most of that, though, has led to acqui-hire type exits. You hear these claims that acqui-hires are now going for five to ten million dollars per AI PhD, things like earnings profiles for many of these startups is nonexistent. That's not entirely the case, but that's generally the case. It's really much more focused on these AI majors picking up talent and continuing to consolidate that technology.
What that means is that hiring AI talent has been priced out for most industrial sectors. Furthermore, questions often come up. Well, will we see major consultancies and major IT service organizations picking up and delivering machine learning and AI as a service? That has been, I think, unclear to date. There's been some notable successes, but there have been some notable misses as well from some of the most prominent large-scale consultancies.
Against all of that is a backdrop of technical commoditization driven by open-source software. Here, I have some of the core tools of large-scale machine learning deployment, whether it's a programming language like Python, whether it's a machine- and deep-learning library like TensorFlow and PyTorch, whether it's delivering those products at scale with containerization and orchestration engines, things like Docker, Kubernetes, database infrastructures like Spark.
A lot of this is now very well understood, entirely free, available to small teams and large teams alike. What that means is, as a technology landscape, this is very different than the previous technology cycle that we saw 20 years ago. We've evolved into this lumpy AI ecosystem. The question that then comes up is, is AI going to evolve into a software-as-a-service-like model, as happened with the previous technical cycle?
Is there such a thing as AI as a service? Can we see that coming currently? It's not there yet, so what we see a lot now is platform-as-a-service-type models. I particularly like this figure. I hope you can see that. What you see is, there are certain APIs available for particular platform technologies. What I mean by that is this is not AI as a service that a nontechnical, non-data science team can use. This is something that you need internal engineering resources, you need internal data science and software development efforts to really plug into these technologies. But we do have, currently, from folks like Amazon Web services, folks like Google Cloud platform... We're starting to have computer-vision APIs, natural-language-processing APIs, speech APIs, things like this.
I think that we can expect that to grow, but we don't currently have a good model for what AI as a service will really look like because it's not clear that the technology is mature enough at that level.
We will see, also, some specific AI-as-a-service-type plays, whether that's from the likes of SalesForce in CRM or HR-type applications. What a lot of people are hoping for, and what I want to at least put a question mark around, is this notion of vertical AI as a service. People keep saying, "Oh, we'll get great APIs for financial AI or health care AI," and I think that is currently unclear from the perspective of... Right now, a lot of companies are realizing that their proprietary data, in this commoditized AI landscape, their proprietary data are really central to their value and they're able to capture that. Why would they want to take that insight and put it out into an API? Now, I think there are some exceptions to that, so possibly the large data platforms like Bloomberg or IHS Market may find great value in that, but in the near term, I think, currently we don't have a clear path to these core vertical AI-as-a-service plays.
Great. What all that results in, as I mentioned before, is what I think is a very lumpy AI ecosystem. It's very promising, but it's different than what we're accustomed to looking at. First, AI adoption is really quite low. This is a chart from a recent McKinsey report, where they surveyed companies who have been thinking about AI, who are AI-relevant, and they find that only 20 percent of those companies have actually even touched one [bit of] technology of that, where 40 percent haven't touched at all, and another 40 percent are uncertain about what role this might have to play.
Now, I think a big part of that is because the path to building up AI is confused. That's what I was mentioning before about the early part of this technology cycle. That said, the opportunity remains massive. How do I see companies... When I speak with companies about this, and when myself and colleagues help companies through this, what you find is that companies who are willing to explore a new model, where they're really focusing on leveraging their proprietary data, that is the defensible advantage in this space, where they're willing to pull in targeted, scientific, and technical advisory, pulling in machine-learning experts to help them think through that implementation. And most importantly, I think, right now, as [with] any new technology, this is something that needs a lot of senior leadership and focus on.
What people do when they find this is they realize that they have a lot of software engineering resources in house and they can upscale them using a lot of this open-source software and using a lot of the available technologies to really take this commoditized technical stack and bring it into their company using existing resources.
There've been a lot of great successes in terms of that, as a part of large enterprise digitization strategies, and I think we'll see more of those in the coming years.
With that picture of the ecosystem, I hope that's demystified some of what companies are actually able to get as technology today, and what I want to talk about now is what I think are the four ways in which people extract enterprise value. If you'd like, this is my own version of demystifying the magic and hype around AI and machine learning towards saying what are the actual mechanical ways companies can add this to their business and deliver value.
The first two I put up here are the most obvious, I think, and the ones that we all have seen and think about already: strategic intelligence, right? Whether that's marketing and ad technologies, personalization like the Netflix example that Ryan gave before, something that's very relevant and we'll hear more about tomorrow is in the quant finance space—this is trading and decision making, exploiting more data to deliver strategic intelligence.
The next is capital efficiency. When we start thinking about the ethical implications of AI and we start thinking about job displacement, it inevitably goes to capital efficiencies. How will this displace retail employees? How will this displace last-mile logistics carriers? How will this replace even higher-up skill levels like diagnostic radiology?
Less obvious—and I want to put my own personal editorial comment, that I think these are much more interesting and will be delivering considerable value in the coming years—are the two that are called risk mitigation and optimization of value-centric processes. Risk mitigation: I know that doesn't come as a surprise for this audience. This is something that, from a regulatory perspective, a lot of people in this audience think about a great deal already, whether that's insurance, credit, surveillance, fraud, cybersecurity. Really thinking about not just predicting performance or cutting costs, but thinking about how to control risks is something that machine learning and AI have a lot to say about.
The last, and this is my favorite, is optimization of value-centric processes. What you find is that core machinery, skill enhancement, precision control—I'll get into some details about this—these are big opportunities for the future. The key themes across all of this is that there's core opportunity in proprietary data. Furthermore, when I said I'm trying to demystify the hype, what I'm trying to point out is that in all four of these buckets, bringing AI into an enterprise requires rationalizing the business problem as a quantitative objective. Once you have cast a particular problem in the enterprise as a regression problem, AI and machine learning can help in the same way that applied statistics can help.
Strategic intelligence: this is an obvious one. I'm going to, in the interest of time, I want to go through this a little more quickly. Data can upvalue customers or improve core decision making. Retail, and ad tech—we've already see this a tremendous amount in the ecommerce space. Health care is something that makes a great deal of sense to us intuitively and we hear a lot about in the media—you hear forecasts ranging anywhere from 2 to 10 trillion dollars of impact in the health care sector over the course of the next 10 to 20 years. We expect this to be a massive trend going forward.
As a parenthetical—obviously, this is fundamental to how financial firms have operated for decades: exploiting data to understand their core decision-making processes. Really nothing new here, but it's interesting to see, particularly in the hedge fund space, how AI and machine learning are being included as one more arrow in the quiver of technical tools that people have to deploy in the markets.
Another example that we're starting to see.... There's obviously a lot of noise about fintech, but I want to clarify, and I think this is a nice chart that shows just how little data, currently, retail and commercial banking exploits from the ecosystem of data that will be—and, in some cases, already is—available to these institutions. This is something that there's a lot of attention on and that will grow, certainly, over time.
There's a tremendous opportunity there, both in terms of upvaluing customers, understanding customers better, and delivering better products there, but also managing some of the costs of poorly understanding a customer, whether that's from a credit perspective or similar.
Capital efficiencies: we all know what technology does. Technology has a tendency to make efficient certain administrative and low-skill work functions. We know transportation networks in factories, automated trucking, self-driving, etc... We already see examples of this happening.
Back-office administration will be massive. I think one interesting question that I've seen a lot about is—particularly in the last 10 years—we've seen substantial increases in regulatory costs borne by a number of institutions that are represented here today, and I think a lot people are starting to wonder how natural-language processing can help with managing compliance costs and that administrative growth.
Just to give one particular example: this is from an analysis that I like on health care information technicians, some of the low-end data processing functions that are quite vulnerable in the health care space. Even that very simple function alone is estimated to be nearly 30 billion dollars annually across the globe.
History has taught us that new functions will replace a lot of these low-skill administrative work functions, but others will remain protected. I think this is one of the most important societal questions that we have right now during this technology cycle. Rather than getting into that in any detail, I'd point some folks who are interested in reading more about this to this paper: "The Future of Employment: How Susceptible Are Jobs to Computerization?", which I think does a nice job, both in terms of contextualizing AI and machine learning in terms of the history of technological cycles, but also does some actual analysis of labor functions which might be susceptible to obviation by AI and which are protected.
The last point I want to make is on—excuse me, not the last point, second-to-last point—is risk mitigation. So AI and machine learning offer a tremendous amount to... When we talk about AI machine learning, we typically talk about enhancing performance. Can it predict a particular value better? And what I want to clarify is that there's been a tremendous amount of work in AI as well. You don't see it as much because it's a little bit more statistical in nature to understanding variability, to understanding tail events. So things like large-loss insurance claims. Understanding this better has a great impact on a lot of the ways insurance companies and providers behave. Another example is consumer credit and the charge-offs—I think that's currently around $60 billion. So can we do a better job accessing data, exploiting that data in a strategic-intelligence way to reduce that? I've seen estimates ranging from a 10 to 25 percent reduction in credit charge-offs, which would be absolutely massive, I imagine—including for some people very specifically in this room.
Finally: optimization of value-centric processes. So if we understand core-value-delivering processes within an enterprise as a rational function that should be optimized, we start to see some real opportunities for AI machine learning. What do I mean by this? First, complex interacting systems are the norm across many sectors. This is your factory line. This is your processing facility. Often these are not big data settings. We talk a lot about big data, and big data and machine learning seem to go hand in hand. But I want to clarify that there's a lot of work that's trying very hard to access the small data regime. Why is this not big data? Well, certainly there's a tremendous amount of data. There's a lot of sensor readings and things like this within a factory. But the point is you have a large parameter space that you need to reason about. You have a tremendous number of dials on these various instruments throughout your process. But the number of days you have of collecting data are actually relatively small. Even if you've been going for 10 years, you really only have several thousand data points.
So one way to think about optimization and value-centric processes is to think about classic, tailored, scientific management. So what we saw over the course of the 20th century was compartmentalization of function. Why was that a really good idea? That was a really good idea because it said we have humans who are very good about reasoning about two or three things at the same time. So we're going to confine this particular human to this particular function. That drove a lot of the great advances in efficiency in the industrial and the manufacturing sector. What that results in, however, is what are called local optimality, but global sub-optimality, because you have people very focused on this particular item without thinking about the global picture. That's what current machine learning, currently available AI, is really able to do now, is to look across those functions and really extract incremental value. I think two of the best examples for that are in agriculture and in mining and resources.
So in agriculture, what you see here is that there are forecasts predicting nearly a trillion dollars in increased global crop yields through a variety of different steps in the value chain, right? Whether that's fertilizer, compaction, irrigation, weather mapping, etc. The point is all of these systems interact. And so optimizing each one of these individually turns out to be suboptimal. And it's something that a lot of people in agribusiness have focused on for a long time. If you have supervisory systems sitting on top of that, collecting that data and making optimal decisions, you can extract a tremendous amount of value.
The second example I show is in the mining and resource sector. This is a schematic of a platinum mine. In the end, what is a mine? It's a thing that turns rocks into something valuable. And taking that is a multistage complex process that grinds, crushes, does chemical processing, does separation, flotation, etc. Each of those systems has been individually optimized to get the most platinum out of a rock. And you have line operators working on each of these, but they're not globally optimized and conditions are changing on every single day. A lot of macroeconomic data has very similar features. So how do you design a supervisory machine-learning or AI system that can sit on top of that and make decisions across all of those systems to achieve incremental performance? These are things that we're seeing happening in the IT sector. We're starting to see it in some other complex process spaces. As my own editorial comment, this is, I think, in some ways the least flashy version of machine learning and AI, but something that I anticipate to be tremendously impactful in the coming years.
So with that, let me summarize. First, AI and machine learning are fabulously important technologies that will continue to drive major changes throughout the world economy. Enterprises can access this value in four ways. Capital efficiency and strategic intelligence are things that we intuitively know about, we've talked a lot about. Two others that I think are equally important are risk mitigation and optimization of core processes. How do you build capabilities in this setting? Well, it's not trivial. The ecosystem isn't quite there yet to make this happen in a standard way. But if teams are willing to be nimble and explore some alternative methods, there's really outsized ROI for getting into this space early.
So with that, I'll say thank you.
Diebold: Thanks so much, Johnny. Beautiful. And last, but of course not least, we have Gideon Mann from Bloomberg.
Gideon Mann: Good morning. I got my slides? There we go. OK.
So, you heard two people trying to damp down the hype and now you've got me. I'm sorry. But actually I think I want to tell you is, we don't have Skynet, but what we have is really amazing. And so I'm going to walk through some examples about stuff we're doing at Bloomberg that kind of touches on some things in the world of finance. Of course, there are many, many, many, many more applications in the world of finance.
So let me just spend a moment on Bloomberg. When I talk to nonfinance audiences, they usually think about Bloomberg as a media company. Of course that's only really part of us. Really, we're a software company. We have about 5,000 programmers. We produce this product called the "Terminal." The work that I'm going to talk about today is really all about the Terminal and software. And I can talk about three different kinds of applications. When I think about what we do inside the company in terms of machine learning, really there are two ways that we use machine learning. One is we improve our internal operations and the other way is we deliver new products. I'm going to spend a little more time on products, a little less time on internal operations, but it's hard for me to say which is actually more important. And it turns out, when we think about new products, usually it comes through a progression of first we do it internally, we improve some piece of our business process, then we deploy it a little more broadly, and then we deliver it to clients.
So let me come for the first example. One of the core values that we provide is we make financial data accessible to our clients. And where does this data come from? A lot of different places. Of course it comes from the exchanges directly, comes from pricing data comes, from our clients, in some cases. But in other cases, it comes from documents. In the 1980s, the only way to get this information onto the Terminal, into our databases, because we'd have some person, some poor person, sitting in front of a computer typing it in. They would look at... As soon as the quarterly report or the annual report came out, they would load it up on their screen, type it in into the database. Over the years, different pieces of this process have gotten automated. So one piece is that instead of companies, of course, sending out letters or printing up electronic documents, they put their documents in ritually structured XML format, XBRL, and that data can be directly injected into databases.
Another thing which has changed is that over time people built manual rules to look for particular regular expressions in tables or in little pieces of these documents and they noticed some value, some net profit or gross revenue, and they picked that out, put it into the database. But in some jurisdictions, the way that reporting is done isn't mandated to be standardized and people still put out PDFs, and the PDFs are somewhat general and somewhat custom. And so over the last maybe four or five years, we've been deploying a fairly heavy duty technology—neural networks—to look at the image plane itself of these documents. PDF is actually a programming language in of itself, so you can't really, it's hard to manipulate the PDF. But you can look at the output of the PDF, which is the image plane, the exact bits that are just laid on the screen, and use that.
And so what we've built is we've built a system to outline the tables, to segment them, to understand the structure, the rows and the columns, and then to take all the values out of that table and put them into the databases. Now when it comes to natural objects, tables are just about the most boring thing you can imagine. They're fairly static. They're flat. But it turns out that there's a huge amount of value in automating the process of getting the data out of those tables and putting it into a database. Instead of a person, I have some piece of software. Now in order to get to that software, as Ryan and John have talked about, you have to do the supervised learning process. And so the way that we built that is we observed people going for many decades finding these tables, circling these tables, and picking that data out. Once we observed that, then we could automate.
So let me move quickly. We have a very large news branch and the reason that we have spent so much time in news, at least partly, is because news really does move markets. Some of that news is pricing, some of that data's pricing information, some of it is expected earnings, and some of it is events which are somewhat unexpected or less expected here. This is a case of Muddy Waters, who announced publicly that he put a short on a particular stock. And as soon as he announced this short, that stock dove. Now is this news? Well, I guess it's someone's announcing their position, so it's pragmatics of some sort. This is one of the ways that the market is reactive to new information coming out. You can see on this graph, it's actually quite quick, and I'll go into in another few slides just how fast news is absorbed.
Now the way that we've been kind of responding to this is we built a system that looks for headlines which are likely to move the market. So it looks at this headline and says, "Oh, I think something's going to happen here. I don't know whether it's positive or negative, necessarily, but I think it's of note and so I'm going to surface this to clients." So this is a screen—you'll notice the beautiful amber on black of Bloomberg. So each of these rows is a particular company and name and then a headline associated with that name that we believe, or I should say the algorithm believes, is going to move the market. And this is something that we hope our clients might have up on the screen all the time, but we also deliver it in other ways. We deliver it as a data feed. So you can buy a data feed from us, which includes our news headlines. It also includes a label next to the headline that says, "This headline we believe is likely to move markets."
Now, why would you want to do such a thing? Why would you want to spend money on this? Oh, here we go. OK. The previous slides have been looking at kind of the minute-by-minute time granularity and this is taking you kind of bit lower. This is at the millisecond granularity. Now, of course, this isn't where the fastest of our clients or finance might operate. This isn't in the order book, this isn't at the microsecond level. This is the millisecond level. But what this graph is trying to show is this is trying to show the buy-and-sell movements on a particular name over a few milliseconds. And so prior to this event, some news event has happened. If you'll forgive me, I forget what exactly happened here. The first line, that gorgeous bold red line, is the first market transaction after this news event. The next line is the—let's see, I think this is the yellow line. It's a little hard to see.
There's a yellow line. This is the electronic headline that went out on the wire. The next line, this blue line, is where the annotation of our... Well, this is market moving news came out on that headline. And so you can see fairly rapidly we're able to take that headline in, give this annotation, and then send it out to clients. Then there are a few other things. The entire story came out. So it wasn't... Not just a headline, but a story explaining it. And then finally annotations on the headline. You can see over the course of maybe one millisecond, maybe two milliseconds, there's a fairly large market reaction. This actually... I took this graph from our marketing materials. They like this graph because it shows that our signal, this market-moving news annotation fairly rapidly, it's fairly quick in the process. And so there's still, if you believe, if your system is smart enough to know that, OK, something's happening. I need to increase my spread or adjust my volatility expectations on this particular name, you can adjust pretty rapidly to the change in the market.
How do you build this? Well, the secret for AI, at least supervised machine learning, is really that AI is people inside of it. Just in the table example, you have a person who is annotating tables and annotating all these different diagrams. Here, too, the way that we build this system was a fairly conventional supervised machine learning problem. We looked at market events where there was a big change in performance for a particular name and then we looked at to see whether there was a news headline that could support that market event, to say that, "OK, this stock rose 10 percent? OK, this is the market event that caused that." And we actually spent a lot of time getting both our annotation standards—that's the standard that you'd give to someone that's going to make the adjudication—getting those precise and correct and also building up our system that even proposed those events to a person.
And so this, as Ryan kind of outlined... Machine learning is programming with data, and getting the data right in a lot of cases means getting the annotation standards right and teaching people how to give data in the right form. It turns out that there's a lot of value that you get here because you can't just look at any market event and say, "Oh, I see a market move. This is definitely a response to some news event." So just to pick one example, the top left, there are some... I guess it's a 1½ percent or 1¾ percent change in stock, and the headline says, "a notice of occurrence of mandatory." Well, OK, maybe there's some relationship there, but it's completely obscure. You don't really want to build any kind of notification on top of that particular event. And you can imagine there are many of these headlines which maybe they caused a reaction, maybe they didn't cause a reaction, but you don't want to use them as training examples. And so, really, the process here of a human going through and saying, "No, this isn't relevant" is very critical.
So here's another example. Is this a market-moving news event? The Japanese Post president says the company will tone down on mergers and acquisitions, and so it's a headline that was announced. OK, so after this came out, Nomura had a big drop—it was about a 6½ percent drop. Now, of course, Nomura isn't mentioned at all in this headline. It's a headline about the Japanese Post, not about Nomura. Well, why was this drop? Well, it's actually a few minutes later that the story came out that Nomura Real Estate was reported to have been a Japanese Post takeover target. And so there was an expectation in the market that some event was happening or might happen and when this news event came out, the market assimilated this and understood this context. Now I'm showing you this not to say that we can do this, or not to say that the marketing news indicator can do this, but really to illustrate that this is kind of the barrier of where the current technologies can adapt.
I'll give you one more example. This is from, I guess, just a few days ago. Facebook had a... Well, I guess I should start the other way. Match.com and the IAG Group suffered a huge decline. I think it was all in all about a 20 percent, 17 percent decline. And there was no news about Match directly that day or directly at this time. But instead, this was the day of the Facebook F8 conference, and they had announced that they were deploying some dating app features that would be in direct competition with Match. So again, I can construct a story and I could imagine building the kind of machine-learning system that would link together all of these pieces and say, "Oh, OK, Facebook is announcing some news that they're going to be a competitor to Match." This is an announcement of a new competitor. This is the reaction that you might expect or you might expect something to happen.
We didn't do that. This is kind of really the boundary of what we can do. So simple things. Muddy Waters announcing a short against Man Wha, that we can recognize. Announcement of an SEC investigation, the cessation of an investigation, and now it's made of an earnings report—fairly straightforward, concrete facts, we can recognize. And those are having a tremendous effect. This level of inference and this level of disambiguation of kind of thinking ahead: not there yet.
So let me switch gears to kind of one other example. This is news sentiment. So in the same way that we have market-moving news and we have a monitoring system on this market-moving news, we also have a monitoring system for sentiment. Now, sentiment is kind of... In our case, we define it in a very particular way, which is for someone who has a long position in a particular equity—do they see this piece of news as positive or negative? So, for example, Muddy Waters announcing a short, that would be a negative sentiment. An announcement of wonderful sales, that might be a positive sentiment. And so over our entire system, we scrape through all of the news reports and we tally for each company the aggregate amount of positive and negative sentiment for that company over a particular timescale. I think this might be maybe a daily timescale. And so we have a set of all of the companies, and these are the ones that have the most strongly associated sentiment scores for that particular day.
And this, again, is a monitor that you might have up on your screen, and it's also available as a data feed. For the same reason that you might buy a market-moving-news data feed, you might also buy a sentiment data feed, because you might want to take a particular action when you see a very positive or very negative piece of news. And now exactly what that action might be, we don't really have a sense of. I think there is a kind of question about to what degree will vendors seek to be actually market participants? And I can say... Of course, I can't speak for the company, but I can say the problems of actually providing data and providing analytics are very different from the problems of engaging in the marketplace. And I'll go into a little more detail on that.
So how do we build this sentiment analytic? Again, in the same way. AI is people and so at the heart of it, we had annotators that went down and they looked at stories and they marked up. They said, "OK, is this a positive story? If I was a long investor in this security, am I reading this positively or negatively?" And they went through and they annotated all of these stories. I think they annotated maybe a few thousand and we have a continuing process where we re-annotate and kind of reexamine, rebuild our models. Sentiment was actually the first piece of machine learning we did at the company. We had started, maybe 2007, 2008 before it got there. The reason that it kind of sparked a lot more work is that prior to the machine-learning approach, we had been doing a rules-based approach. So people had been sitting down and using regular expression patterns and saying, "OK, I'm going to detect when people are talking this way in a company, they mean something positive, and when they're talking about this way, that means something negative." And these approaches were generally very brittle. Once we switched to a machine-learning approach, we got a lot better coverage, got a lot better performance.
So in the next few slides I'm kind of going to take you down to how we talk to our clients about this sentiment signal, and not from a monitoring perspective, but for really a data-feed perspective. Let me skip this. Our quant team went through and looked at this data signal and they said, "OK, can we can construct a trading strategy on the basis of this signal? And in particular, what we're going to do is, we're going to look at a particular point in time, we're going to look at the sentiment associated with that company over the past 24 hours at that point in time. And then we're going to look at the next daily return for those set of companies. And we're going to look at the performance of companies by quintile on the sentiment scores."
Let me reexplain.
For every day you look at the companies' sentiment scores and you rank them, you say, "OK, this is the most highly rated company by sentiment. This is the most lowly rated company by sentiment." And you break down by quintile—these are the top 10 percent by sentiment, these are the lowest 10 percent by sentiment. And then you look at the daily returns for the subsequent day on the basis of that trading signal. You can see, thankfully, if this wasn't true, I probably wouldn't be here talking about this slide. But the performance of very highly positively rated companies is positive. So there's about, I guess, 40 basis points of performance for highly rated sentiment companies, and the performance for lowly rated sentiment companies is about negative 40 basis points—maybe a little bit less, a little bit more. That's for the case of news. I should say we have two different analyses, one of the cases of news headlines, and the other case of social on Twitter.
So this is just a piece of understanding sentiment, because you really want to look not just at the daily returns. You really want to look at a portfolio, which takes into account a lot of different aspects. So this is one, a graph which is talking about one aspect, which is including transaction costs or not including transaction costs. So, of course, for getting from just a signal to actually constructing a trading strategy, there are many pieces. One piece is how expensive it is to actually get rid of or to acquire stocks on a day-to-day basis. And so you can see that if you don't take into account your training costs, you actually have very high returns from a Twitter and news sentiment score. But once you look at trading costs, you actually have to do a lot more work in order to get performance out of it. So this is just one kind of example of why even when the signal is fairly strong—I think we believe it's a fairly strong signal—it often isn't straightforward to incorporate into a overall training platform, and our clients do a tremendous amount of work in taking all of the signals, these natural language signals, or one of them, and combining them into one algorithm or a set of algorithms. There's a big piece here.
I talked about three different pieces. I talked about understanding natural documents and doing table recognition to kind of pick out data out of those tables. I talked about how news moves markets and a little bit of the work that we've done on building a market-moving-news predictor. And then, finally, I've talked a little bit about sentiment and how you might use the sentiment signal in actually constructing a trading strategy.
And now for something completely different. I wasn't going to talk about these two initiatives, but there's been a lot of discussion so far at the conference and I expect you guys are going to be talking about it for a while longer today and after. So I want to talk about two initiatives that we've done that kind of speak to data science machine learning more broadly. The first is an annual conference we run at Bloomberg. It's in September every year. It's timed to the UN General Assembly. It's called the Data for Good Exchange. It is a showcase for all of the work that people do in applying machine-learning data science to problems with social good. Usually we have a lot about government innovation, service, delivery, health care.
Over the past year we also had another event in San Francisco on this thing called CPEDS, which is the Community Principles on Ethical Data Sharing, which is our way of trying to kick-start a conversation about what a Hippocratic oath for data scientists might look like. How do you from the ground up think about changing data scientists’ behavior to be more aware of ethical considerations? There's a piece.
And then the other initiative I want to talk about just for a moment is the Shift Commission of Work, Workers, and Technology. It's come up a bunch of times already. What is going to be the impact on labor and on the labor market from technology? And certainly, there are economic pessimists and economic optimists. It's funny—this is the one area I feel like economists are often optimistic about effects on labor market, whereas computer scientists are often more pessimistic. Usually, it's a little bit reversed. But the Shift Commission looked in particular at a scenario-planning perspective and they looked at, OK, what might we imagine 10 years, 15 years out the labor market might look like as these technologies make progress. There's a report that came out. The report of the Shift Commission on Work, Workers, and Technology lays out a little bit of framework.
And also now one of the partners in this effort with us, the New America Foundation, has been working on the Shift Commission labs, where they're going to take this scenario-planning exercise and bring it to a lot more communities. So I encourage you to look at both of these. We actually have an open call for papers for Data for Good Exchange now. So, and thank you very much.
Diebold: Thank you so much, Gideon.
Mann: Thank you.
Diebold: Great. Thank you for bringing us back to right on time. So we have about a half an hour for Q&A. I'm going to select various ones. Lots of good things have come in. If I don't happen to call you, don't be insulted. There are far more than we can do. The panelists, however, I'm sure will be happy to hang out and talk with you and maybe expand on earlier answers. Maybe the way I'll do it is I'll take a question, and maybe I'll just tee it off with a quick answer. Maybe if all the panelists could try to be a bit quick as well so that we can cycle through a variety of questions rather than spending 15 minutes on one.
Probably the winner by popular vote is—should be, in my opinion. Most of these are anonymous. Some people, by the way, submitted their names. If that happens, I'll mention the person's name so he or she could stand up here. Maybe you'd want to talk to that person some more afterward as well.
Here's a question. I might not read every question exactly. I might embellish it or shorten it a bit or whatever. Tail events are rare and appreciation of history is important. ML, machine learning, may be susceptible to recency bias. Can machine learning really help with large loss mitigation?
I'll just quickly tee off a response of my own. I think it's a real issue. It's nothing unique to ML. It's true in any predictive modeling or regression problem and ML, as was emphasized, is just ultimately function approximation. It's regression. So you've got that. On the other hand, a successful element of many, many predictive modeling strategies is shrinkage. Anything Bayesian, for example, pulls things in the direction of a prior. All sorts of famous approaches going back half a century like ridge regressions shrink things. One can use informative priors and effectively build in knowledge or worry that back in the 19th century, even, we saw crises. And they might not be in our data set at the moment now, but do we think they can't happen again? Of course not. I think Bayesian kind of informative priors might be a useful tool here.
Adams: I'd like to add to that because, first of all, I entirely agree with what you're saying and what the question is asking. Since I was the one who claimed that perhaps machine learning can help with large loss claims, I think one way it can help, which is orthogonal to the point just made, is adding features. This is one of the big things that I think that machine learning has distinguished: is now we have the ability to get many more heterogeneous data types—things like natural language, images, other types of less structured records into a particular prediction. While it's absolutely true that recency biases and other statistical issues will still exist, I think there is still gain to be made by grabbing all that extra richness of features that is available.
Mann: I think one of the interesting things about finance as an application domain for machine learning is it really, at least in the market, is a zero-sum kind of a situation. Usually when you think about machine learning, you don't think about playing against an opponent that is as smart and as dedicated and as hardworking as you are. I think one of the consequences is that the game is ever-changing and getting always more complicated. I think even if machine-learning approaches are going to have an impact, there's always going to be a need—there is still a significant need for some other kind of meta system to be looking introspectively or to see, does this make sense? Is there some kind of regime change event that's going on here?
Diebold: Anyone else?
Adams: Yeah, I agree with everything that's been said. I think the shrinkage point is a really strong one. I mean, the kind of approach you expect to work in this kind of situation is one where you can build analogies across many different kinds of things. We would view that as a hierarchical Bayesian model, for example.
Diebold: Great. Let's move on. Here's a question. Let me highlight it. There we go. This person, anonymous again, says that ML discussion often focuses on prediction and clustering tasks. By that I take that to mean noncausal, exploiting correlations for predictive gain but not thinking that things are causal. Just the person then goes on to say, but will we eventually be able to use these techniques to discover and quantify causal relationships? Gee, this is great. I get to go first. You guys spoke already so I'll make it really brief, though.
My understanding is yes. It's already happening. Lots of companies are effectively running experiments online, on the web all the time. On Google. I might move the location of where I place an ad to see if it increases click-throughs. I might every day change prices around a little bit to try to trace out my demand curve, etc., etc. People like Susan Athey at Stanford and many others are working on things like that. So I think it's happening, but there's lots of aspects and issues.
Adams: Yeah, absolutely. Then causal inference is its own big topic in both ML and statistics.
Cunningham: And growth.
Diebold: Anything else?
Mann: I agree.
Diebold: OK. It's funny. We have so many questions that looking through them gets confusing. Here's the one I wanted. This is a fascinating question. It just got another vote. It was 9, now it's 10, from Pat Parker who signed his name. Pat, are you around? Stand up. There's Pat. There's Pat. This way you know where to find, everybody knows Pat anyway, if you want to talk to him some more. Pat writes... Fascinating. One major concern is how to ensure fairness equity in ML algorithms. ML is agnostic. How do you ensure practices like redlining of minority neighborhoods do not occur if decisions are made by ML?
Mann: I'll step into that.
Diebold: Build in a constraint, like don't redline. It's not so easy, is it?
Mann: No. You know, there was this—I don't know if you guys saw it—there was this work a year or two ago on this system called Compass, which was a sentencing guideline assistance tool for judges. What Compass did is, what they investigated for reporting—I think it might have been ProPublica, one of those outfits—they looked at the judgments out of Compass and they said, "Oh, wait a minute. Compass is recommending harsher sentences for minorities, for people who are black." They were like... The reporting was clearly, this is a racist algorithm. Compass came back, or the company that worked on Compass came back, and said, "No, no, no. We're not racist. This is just the data that we've gotten."
In fact, if you look through the data that they've gotten, they were training off of judge decisions, and the judge decisions were harsher on people who were black. OK. Where's the problem here? Is the problem the algorithm? Is the problem with the judges in the first place? I think unfortunately one of the things that happens is people assume that algorithms are going to greenwash or algorithmically wash the set of decisions, and they really don't. They just kind of make you examine more carefully your assumptions and your bias in the first place. I think unfortunately making sure that bias is eliminated from all of these places is certainly not easier.
In some cases, it might be more difficult, because instead of having a conversation with a person saying, "You judged this person more harshly," how do you actually have that conversation with an algorithm? That's kind of the bad news. The good news is that there is a tremendous amount of academic work and attention in this issue. There's a conference called Fairness, Accountability, Transparency in Machine Learning, FATML. I don't know why they chose that particular name. That's the conference, and they have a huge amount of work in understanding how to detect those biases and also trying to mitigate those when you can explicitly.
Adams: I would actually take a more optimistic view on one of the things you said, though, which is that this question of I can have a conversation with a judge, I can't have a conversation with an algorithm. Actually, I think it's good the other way around. The algorithm won't misrepresent its own beliefs.
The algorithm doesn't have... The judge will attempt to not appear racist and might change their behavior in response to the actual questioning that might be independent of how they actually perform the sentencing, whereas the algorithm is what it is, and we have tools for interrogating that algorithm and asking it exactly how it would sentence hypothetical people and so on. It will give the same answer that it would give in the courtroom. There's a sense in which that, although the interpretability of these algorithm is its own issue, there's an opportunity for transparency simply because the rules are codified and reflect whatever biases the society has whenever it generated those data.
Cunningham: I just wanted to add one point. I agree with everything that's been said. I just wanted to highlight, too, that in addition to there being conferences specifically focused on this, some of the core AI and machine-learning conferences have annual workshops on exactly this question. While we don't have the correct framework to think about it now, this is something that a lot of people at all levels of the community are really focused on because it is so important a question.
Diebold: The only thing I would say—and I don't work in the area so it might be naïve—but, again, these are regression problems at the end of the day. You're minimizing some loss function, typically quadratic. One can impose constraints, so of course it might be very difficult to mathematically quantify a no-redlining constraint. Maybe the constraint set isn't convex, so there could be multiple optimize, all sorts of issues. Not to say that it would seem to be easy. It would seem to me that imposing the relevant constraints or the unconstraint often would get you in the right direction.
Adams: It's harder than that. If you know what all the constraints are a priori, then great. The problem is there's lots of notions of fairness that you might not realize need to be encoded until the system exists.
Diebold: Yeah, but at least the first step would start the conversation. You need a model. It takes a model to beat a model. If I write down something pretty naïve, you can take it and say, "No, Frank. You need to think harder about the subtlety of this constraint." And we would pass it back and forth.
Adams: Part of the issue as well is that there are lots of... In real data, there is the constraint that you would like to write down, which might be as simple as "race should not be a part of this decision." There are many things that correlate with those, with that label. You have to make a careful decision about somehow removing possibly very complicated nonlinear ways that all of these other covariants correlate with the thing that you'd like not to be biased around.
Mann: Yeah. Just one more point about that. I guess what it really comes down to is that it forces you, once you start talking about things algorithmically, it forces you to be very explicit about exactly what you mean and exactly what you want to happen, which is often uncomfortable.
Diebold: OK. Let's move on. Here's an interesting one. Let me highlight it. Here we go. Machine learning is a great concept. Can you speak about data quality and data sufficiency and their potential impacts on ML outcomes? No need for me to go first, actually.
Cunningham: Sure, I'll jump in. Absolutely. One thing that I think a lot of the hype around machine learning and AI has swept under the rug is that garbage in, garbage out is as true now as it ever was of quantitative methods. A lot of what's happening right now is that people are realizing that the data—the large, unstructured data that they have in their enterprise—is a mess. There is often a huge IT services and engineering project up front just to manage this data and put it into a digestible format. That is absolutely, I'd say, a big obstacle to ML outcomes in the near term.
Adams: One thing I think that's worth pointing out, too, is that I think a lot of the successes we've been seeing recently with deep learning and other kinds of machine learning has been a kind of data that isn't actually particularly noisy. It's highly structured and ambiguous because it's very high dimensional and whatever semantic concept. You want to extract the... You'd like to answer the question, is there a cat in this image? That's a hard question, but it's not hard because the image is noisy. There are lots of things that have that flavor. I think in some ways, the current kinds of successes we're seeing in ML actually are not, they're not necessarily great at handling the really ugly kind of data that you see in practice.
Mann: We had... There's a conference we hold in machine learning and finance. Actually, we held it at Columbia maybe a few weeks ago, maybe a month ago. One of the talks from Bloomberg was someone talking specifically about noise reduction and noise compensation. It's a big issue. They're always like, there are always a few ways to skin the cat. One way is you throw more machine learning at it, and the other way is you get better data. It's just kind of another lever.
Diebold: I'm generally an optimist on all these things, and I'm an optimist here as well, but let me just admit that often it occurs to me that, at least in certain important respects, big data is just a big hassle. People sometimes talk about digital exhaust. A lot of the data that's collected is like exhaust coming out of a car. You choke on it, but it's hard to cut through in many cases to the relevant thing. Of course, I work in economics and finance where the data are noisy typically.
Another aspect of that is dimensionality reduction. I see a lot of cases where I might have 10,000 potential covariants, but i you think about the problem, the relevant space might be spanned by just six or seven of them. Or something like principal components or some way of distilling them. It's a big deal. More data is not necessarily always better. You have to think about how to cut through the noise. Of course, I agree with everything you said. That's what people are trying to do.
OK. Let's see. Here's one. If structured ML—I think that might mean supervised ML—is just complicated regression analysis, how do we guard against the usual problems such as degrees of freedom, overfitting, spurious regression, predictive breakdown, etc.?
Adams: Those are all the problems that exist. That is what ML research is about. It’s solving things like exactly this issue. How do I balance the fact that my neural network has millions of parameters and I only have hundreds of thousands of data, say, or whatever? These are all just exactly what the day-to-day life of machine learning and other kinds of regression is about.
Cunningham: I also just want to point out that I think when Ryan was talking about, saying, there's a lot of careful engineering that has made these AI successes. I think the marketing engine that is around AI will often get ahead of the fact that really what that careful engineering effort is is this.
Adams: Good priors.
Cunningham: Yeah, good priors, thinking about overfitting, training procedures, how to implement that computationally—not to this question, but generally. This question is really quite astute because this is exactly what the work is about.
Mann: I used to work for Fred Jelinek, who is one of the people who is I guess largely responsible for speech recognition, statistical speech recognition. We used to say that the problem of science is basically that of data sparsity, which I think is what the slide is asking. That is what machine learning is.
Diebold: I'm going to interject with a question of my own since I didn't have the opportunity to write them in. When you just mentioned neural networks, it reminded me of your great talk and how you emphasized them and so forth. I have a question. It's kind of a tacky question, but I think it's totally relevant. That is, again, these are all just function approximations. They're just regressions. You know, there's some celebrated theorems for neural networks that under conditions, as the number of neurons grow you can consistently approximate any relationship and all that kind of stuff.
That's been true in statistics for kernels and nearest neighbors and splines and trees and five other nonparametric estimators for decades. So it brings them up to that level, but does it improve on that level? What is it about neural networks per se as opposed to trees or nearest neighbors or whatever that's so intriguing to the ML community?
Adams: The reason people are excited about them is because there is a class of problems that people care about that they’re way better at than any of the previous techniques. I don't think it's a controversial fact that a convolutional neural network can perform visual object recognition in a way that was totally infeasible 10 years ago. Same is true, speech recognition has undergone a total revolution in the last 10 years and it's entirely because of convolutional neural networks and recurrent neural networks and things like that. In my group, we had essentially set the state of the art for predicting the properties of small organic molecules built around a neural network architecture.
My point is that it's not magic, but the reason these things are successful is because you look at the problem hard, and rather than using a generic nonparametric estimator, you use a nonparametric estimator that reflects some invariance that you think exists in the data. Like conversational neural networks, they're called convolutions because essentially they're parametrized in such a way that if I shifted the image around a little bit, it would still give me the same answer.
It's very, very hard to come up with a spline model or a traditional kernel density estimator that has that property. I think, at the end of the day, I view these as this community has come up with a really good set of priors for particular kinds of problems. Then they can stick a gaussian process regressor on top of that if you want. It's just a different class of functions, and it happens to be very well chosen for an important set of problems.
Mann: So I think from... I'm clearly not as deeply versed in the theory or even in the research but I think from what I see is that it's also the combination of the feature extraction piece and the modeling in a neural network—they're almost combined. All of this process in a traditional machine-learning approach... Of course, you'd spend all this time mapping out your features and getting them very carefully. Once you have this very deep network, you don't have to do that. As much. And one of the consequences that has really reduced developer time. Someone, a less experienced developer, can get good results with a neural network faster than they would with traditional machine learning.
Cunningham: I want to add to that. I think both these points are correct at a different line for why neural networks seem to have won against spline models and different kernel models and the like is because exactly the way they're set up has united so well with computational technology and skill sets that we have available, which is a very mundane reason, perhaps, but there's this magical correspondence between how you need to compute things in a neural network and graphics processing units. That's such a simple thing, but it's tremendously powerful.
Another part of that is, because a neural network in some ways is a very simple object to understand, it agrees perfectly with the computer scientist's desire and intuitive understanding of abstraction and modularity. You have these neural network layers. You add them on top of each other. You can play around with each of them individually without having to re-derive anything. That works really, really nicely, so in a sense it has taken function approximation, which was this mathematics, statistics, and optimization question, and has turned it into a problem of software engineering. There's just a lot more software engineers out there.
Diebold: Thank you for that education. That's good. Let's go to this one. I think it was Bob Solo a long time ago had the famous quip that you can see the computer revolution everywhere except in the productivity statistics. This question is very much in that tradition. What's the basis of productivity estimates, presumably opt.imistic productivity estimates, coming from ML and AI? The internet economy had similar optimism. Recent lack of productivity growth is a conundrum for the Fed. Why will ML and AI be any different and solve things?
Cunningham: Having been the one that cited the productivity estimate, I'll walk into that one. Those interested in productivity and capital stock and the implications of ML and all of that, there is, I think, a thoughtful piece written by Goldman Sachs investment research, which is where I got that chart that estimated 50 to 150 basis points of productivity increase. That particular number comes from... I should clarify this is not my area of study; I'm an interested reader, so I do strongly recommend speaking to somebody who knows more about this than I. But that said, this productivity number comes from labor hour reductions for particular functions—essentially improvement of work function.
Mann: I can't answer that question, but I can answer a related question, which is one of the things that seems to be very clear. John, you talked about this in your talk, that is that there is really a consolidation among AI in particular companies. I think that if you look at overall the way that automation has changed companies is I think it really has had a monopolizing effect in which the successful companies are able to increase their rate of success because they're able to scale much faster and more easily. I suspect the same will be true about ML and AI. It'll enable smart companies that have successful data, flywheels, and are able to capture the right kind of exhaust and the right kind of users. They'll be able to increasingly gain a larger chunk of the market. I can't say whether there's a relationship, but that effect seems very clear to me.
Diebold: OK. Let's close with one more. This is kind of interesting, very applied, very relevant. Here we go. New AI tools and techniques are being developed at a rapid pace. By the time one becomes well-versed with one, it almost becomes obsolete. In such a scenario, how should firms prepare their workforces?
Adams: This is a tough one. In my view, the key to work in this area is to not have a tool-centered view of the world but to have a concept-centered view, an algorithmic-centered view of the world. Part of the downside of this hype is that it's caused people to maybe be quite myopic with regard to the tools as being the thing, whereas I think the calculus and probability is the thing.
That's a professor sort of viewpoint, but I feel like the hard part is training somebody to get good at being able to... Where they have the option to build it from scratch if they wanted to. They just take advantage of, say, tensor flow or whatever for productivity reasons. That said, it seems generically like a good idea to choose tools that are invested in by companies with giant market caps and who are supporting them with hundreds and hundreds of engineers like tensor flow, where I expect that will be around for a while.
Mann: I don't think prepare is the right way of framing the decision. I think that, I suspect, and one of actually the things that came out of the Shift Commission is that the role of the university and the role of education is probably going to have to change to support a lifetime-learning scenario. In that context, it's not really that you're going to go to school or you're going to learn a set of skills and then you're going to be able to return for another 40 years or 30 years, 50 years, and [get a] return on the investment. But rather, you learn some stuff, you become useful. The skill needs shifting. You shift and you have to be retrained.
One of the things that we've been doing is we've rolled out actually quite a large machine-learning curriculum. It's a little bit grandiose. We call it ML EDU, but it's about five, eight sets of courses that we're putting our engineers through. Different people go through different pieces. We're also trying to educate our product groups. I think that this mode of lifetime learning is going to be quite common.
Cunningham: Yeah, I agree. I think... I've seen a handful of companies who really try to focus on hiring a top deep-learning engineer. That's a very expensive item in the marketplace right now without necessarily the infrastructure for an enterprise to actually take advantage of that. What I think should be happening more and when I see success that looks like this is [when] people take a lifelong-learning approach and they upscale their existing software engineering core.
Because with what Ryan was saying, right now, a lot of the machine-learning tools have to be... To be competitive in that space, you have to be open source, and you have to be supported by a large number of people. If you upscale your core software engineers, then they will be on that path and on that trajectory to continue to consume those tools as they come up. Identifying those tools and staying on the competitive curve, I see people creating partnerships with academia, finding scientific advisory boards to help stay relevant and current.
Diebold: OK. Thank you all for coming. Let's thank our three fine panelists.