Data science has useful applications in financial services and the entertainment industry. Matthew Marolda, Chief Analytics Officer at Legendary Entertainment, and Dr. Anthony Scriffignano, Chief Data Scientist at Dun & Bradstreet, tell CXOTalk about the core techniques of data science, its common elements across industries, and lessons to be learned.

Marolda formed and runs Legendary Entertainment’s Applied Analytics division, which uses data and analytics to drive strategic decisions across all aspects of the company. Prior to Legendary, Marolda founded StratBridge LLC in 1999, offering “moneyball” player analysis software used by many organizations in the NFL, NBA, European Football and Major League Soccer (sold to XOS Digital in 2012), and dynamic pricing and revenue analysis software used by many pro sports teams (acquired by Legendary in 2014, after Marolda joined Legendary). Scriffignano has over 35 years of experience in IT, Big-4 management consulting, and international business. Sciffignano leverages deep data expertise and global relationships to position Dun & Bradstreet with strategic customers, partners, and governments. A key thought leader in D&B’s worldwide efforts to discover, curate, and synthesize business information in multiple languages, geographies, and contexts, he has also held leadership positions in D&B’s Technology and Operations organizations and served as the primary inventor on multiple patents and patents pending for D&B.

Transcript

Michael Krigsman: For many of us, data and analytics are a black box. We don't know what goes on behind the scenes. The secrets of data and analytics, that's our subject today. That's what we're talking about on Episode #294 of CxOTalk. I'm Michael Krigsman. I'm an industry analyst and the host of CxOTalk.

Now, we have two special guests. Before I introduce them, I want you to invite your friends. I want you to tell your friends, your family, your coworkers, and everybody you know to tune in, and they should watch with you. Not only that. You and them, right now, subscribe on YouTube. Tell everybody you know.

Now, we have two amazing guests. They've both been on CxOTalk in the past. First, I want to welcome Matt Marolda. Hey, Matt. How are you? It's great to see you here again.

Matthew Marolda: I'm great. How are you?

Michael Krigsman: I'm good. I'm good. Matt, you're in Boston today. Tell us a little bit about what you're working on.

Matthew Marolda: Sure. Sure. My life in data and analytics has gone on for many, many years, covering many different areas, everything from analyzing publicly traded companies--that was a long time ago--to much more recently in professional sports in the world of moneyballs, as folks have come to call it. Then, even more recently, over the last five to seven years, in media and entertainment, and applying these kinds of techniques to large-scale movies and television shows.

Michael Krigsman: Fantastic. Our second guest is again no stranger to CxOTalk, coming to us from the wilds of Europe, someplace in Europe. Anthony Scriffignano, how are you?

Anthony Scriffignano: Hello, Michael. It's great to be with you again. Thanks very much.

Michael Krigsman: Anthony, tell us about -- [laughter]. I feel like this is jeopardy or some game show. Tell us about what you do.

Anthony Scriffignano: I'm the chief data scientist at Dun & Bradstreet, and I am responsible for innovation around advanced data science topics. I also work with regulators around the world as the world of regulating data is changing itself. I get to work with some of our more sophisticated customers on some of their more sophisticated problems.

To Matt's point about where I started out, I started out doing physics for cranes, construction cranes, offshore oil rigs, nuclear power plants, and all kinds of weird things, and here I sit. I think the journey that takes us to where we are is often one that will surprise us.

Michael Krigsman: Well, you know, the thing that I find very interesting right now is how you're both involved with very large data sets, but from completely different industries. Matt is working in entertainment and media and, Anthony, you're in financial services. I think that, for this show, the conversation between the two of you, comparing and contrasting the type of work that you do or the problems that you work on, will be most fascinating.

Matt, maybe I'll ask you first to talk about the kinds of problems or the business problems that you have been applying data sets to and solving for.

Matthew Marolda: Sure. Sure, happy to do it. We live in this unusual place where we have these very large, binary outcomes, meaning we have a movie that we're going to release, say Godzilla or Kong, movies of that kind of scale. There's only really one world we can live in, which is the world where that movie is released, which means we can't run tests. We can't do a lot of things that a lot of people in data science would like to be able to do where you have controls.

We can do that within the campaign and within very small windows, but it's very hard to, over long periods of time, iterate and adjust. We're in this situation where we have to really work to thread the needle and learn as much as we can as quickly as we can in these also ambiguous environments where the correlation to the data we have isn't perfect to the outcome. We don't have these really direct correlations. We have to operate

Michael Krigsman: For many of us, data and analytics are a black box. We don't know what goes on behind the scenes. The secrets of data and analytics, that's our subject today. That's what we're talking about on Episode #294 of CxOTalk. I'm Michael Krigsman. I'm an industry analyst and the host of CxOTalk.

Now, we have two special guests. Before I introduce them, I want you to invite your friends. I want you to tell your friends, your family, your coworkers, and everybody you know to tune in, and they should watch with you. Not only that. You and them, right now, subscribe on YouTube. Tell everybody you know.

Now, we have two amazing guests. They've both been on CxOTalk in the past. First, I want to welcome Matt Marolda. Hey, Matt. How are you? It's great to see you here again.

Matthew Marolda: I'm great. How are you?

Michael Krigsman: I'm good. I'm good. Matt, you're in Boston today. Tell us a little bit about what you're working on.

Matthew Marolda: Sure. Sure. My life in data and analytics has gone on for many, many years, covering many different areas, everything from analyzing publicly traded companies--that was a long time ago--to much more recently in professional sports in the world of moneyballs, as folks have come to call it. Then, even more recently, over the last five to seven years, in media and entertainment, and applying these kinds of techniques to large-scale movies and television shows.

Michael Krigsman: Fantastic. Our second guest is again no stranger to CxOTalk, coming to us from the wilds of Europe, someplace in Europe. Anthony Scriffignano, how are you?

Anthony Scriffignano: Hello, Michael. It's great to be with you again. Thanks very much.

Michael Krigsman: Anthony, tell us about -- [laughter]. I feel like this is jeopardy or some game show. Tell us about what you do.

Anthony Scriffignano: I'm the chief data scientist at Dun & Bradstreet, and I am responsible for innovation around advanced data science topics. I also work with regulators around the world as the world of regulating data is changing itself. I get to work with some of our more sophisticated customers on some of their more sophisticated problems.

To Matt's point about where I started out, I started out doing physics for cranes, construction cranes, offshore oil rigs, nuclear power plants, and all kinds of weird things, and here I sit. I think the journey that takes us to where we are is often one that will surprise us.

Michael Krigsman: Well, you know, the thing that I find very interesting right now is how you're both involved with very large data sets, but from completely different industries. Matt is working in entertainment and media and, Anthony, you're in financial services. I think that, for this show, the conversation between the two of you, comparing and contrasting the type of work that you do or the problems that you work on, will be most fascinating.

Matt, maybe I'll ask you first to talk about the kinds of problems or the business problems that you have been applying data sets to and solving for.

Matthew Marolda: Sure. Sure, happy to do it. We live in this unusual place where we have these very large, binary outcomes, meaning we have a movie that we're going to release, say Godzilla or Kong, movies of that kind of scale. There's only really one world we can live in, which is the world where that movie is released, which means we can't run tests. We can't do a lot of things that a lot of people in data science would like to be able to do where you have controls.

We can do that within the campaign and within very small windows, but it's very hard to, over long periods of time, iterate and adjust. We're in this situation where we have to really work to thread the needle and learn as much as we can as quickly as we can in these also ambiguous environments where the correlation to the data we have isn't perfect to the outcome. We don't have these really direct correlations. We have to operate in these ambiguous environments that force us to look at all different kinds of data and pull it from lots of different places.

Michael Krigsman: For you, it's about understanding--correct me if I'm wrong--essentially, buyer behavior and the linkage, trying to drive linkages between what happens in a movie, for example, and the way that people consume that movie? Is that correct?

Matthew Marolda: Yeah. No, you've got it right. I'll just real quickly highlight it for you. We're very audience driven. Meaning, we need to understand audiences and people at a very specific level.

That starts all the way at the beginning. Is there an audience for this movie or TV show? Does that audience have enough scale to support the budget we might have for it? Those [are] the kinds of questions.

We then want to understand what the audience likes and how they might respond to different elements or aspects of the movie. Then, ultimately, when you get close to marketing, this is where it really kind of escalates. We want to understand; how do we reach that audience? How do we persuade them? What creative materials, meaning the trailers or the ads or the TV spots we could show them, how are they going to impact and affect their ability to at least have some kind of desire to watch the movie?

We're just trying to dial it up. We're just trying to shift the odds to make it more likely, although we can't guarantee an outcome, but we're working on that. It's all very much at the individual level.

Michael Krigsman: Okay. Fantastic. Now, Anthony Scriffignano, you're the chief data scientist at Dun & Bradstreet. Presumably, you're operating on sets of financial data. Maybe you can describe to us the kinds of business problems that you're looking at with data.

Anthony Scriffignano: The types of problems that I'm working on are very similar, believe it or not, to the types of problems that Matt just described, but in a very different way. If you think about our customers, they're trying to solve a problem that's somewhere in the category of either total risk or total opportunity. What's the white space? What could I possibly do if I penetrated this market? If I went into this country, can you help me find more companies that look like my best customers or don't look like my best customers?

Then, on the risk side, are they going to pay me? Are they fraudulent? Are they going to go out of business? Those are the problem spaces.

But, I have exactly the same edge of the possible that was just described. The unstructured data, the data we've never seen before. Everyone is really good at what's called supervised learning right now, looking at structured, longitudinal data that's been around for a long time and building, basically, regressive relationships and then saying, "Here's what I think is going to happen," assuming the future looks something like this past set of data that you've trained on.

The problem is, the future doesn't look like that set of data. The future is ambiguous. The data in the future has never been seen before. Now, recently, some of it you can't use because of those different regulations, so you have to unlearn things.

The problems of understanding things we've never looked at before in ways that are changing while we're looking at them are the same. This tale of two cities that we're telling, it's really the same set of problems. It's just a different use case at the end.

Michael Krigsman: How can we dive into these comparisons? Matt, as you're hearing Anthony talk, is the fundamental nature of the problems resonating with you in the same way as he's describing?

Matthew Marolda: For sure. Certainly, the outcomes we're both managing to are very different, but the approach, the path feels very similar. Similarly, we are dealing a lot with unstructured, ambiguous data, right? [Laughter]

Anthony sounds like he's taking a slightly more intellectual bend on it. Our is a little craven. [Laughter]

Anthony Scriffignano: Well, actually, I'm sitting here listening to you describe your problem, and I'm thinking that is so cool. I'd love to have a problem like that.

Matthew Marolda: [Laughter]

Anthony Scriffignano: [Laughter] There is something that we work on that I call a Black Cat Problem where you're looking for something that may not be there in a place that's inherently hard to look. In our case, think about fraud, or think about maybe some other type of bad behavior, malfeasance. If you try to model your way out of finding things like that by looking at all the previous bad stuff, the best bad guys, when they know they're being watched, they change their behavior, so you'll model how the best ones are no longer behaving.

In your case, you're trying to chase the next big thing, but the next big thing doesn't look like the last big thing. That's why it's a big thing. You have your own black cat problems.

I really do think we are going to separate schools together. I think we are solving very similar problems.

Matthew Marolda: For sure. Absolutely. No, I think you're right. I think one thing I'd be interested in your perspective on, an area that, again, the craven aspect of what we're doing is we're looking for competitive advantages to make more money. That's fundamentally what we're trying to do.

When we are looking for those advantages, we're finding them in places, like you said, in these dark rooms. I think, at first, we had a match. [Laughter] Then we get to a candle. Now, I think we're hopefully having some kind of lantern. Our lantern has been through unstructured data, right?

Anthony Scriffignano: Yes.

Matthew Marolda: You find unstructured data. I'd be interested in how you've addressed this problem, but we've been in the situation where we don't generate a lot of data ourselves. We're not a first party data company.

Anthony Scriffignano: Mm-hmm.

Matthew Marolda: Our products go out to the market through different channels, whether it's exhibitors like movie theaters, or whether it's an online, over the top player, like Apple or even Netflix, so we don't get a lot of data back. We have to work within that space. Have you found similar things? How does Dun & Bradstreet, which is a data company, fundamentally approach those problems?

Anthony Scriffignano: Ironically, we actually create a lot of data. We create much more data than we curate. There's a misperception that we just go out and collect data from all over the place, we bring it together, and sell it to people. Nothing could be further from the truth.

Most companies in the world, more than two-thirds of company in the world, are private, and private means they don't have to tell anybody anything. We do have to do exactly what you're talking about, but in a different way.

There's an interesting problem that, while you're trying to find that next big thing--you had a much better way of describing it--that big opportunity, so is everybody else. They have smart data scientists too, and they can look at a lot of the same data you can. And so, the trick is to not try to outsmart them. You're never going to be the smartest guy in the room. Sorry. None of us ever are. You might be the smartest person in the room about one thing, but there are billions of things.

The trick is to use what you can see that you know they can't see, or to use it in a way that has a competitive mode, that even if they knew what you were doing, it would be very hard for them to replicate. Because of where you work and what you do, you have a catbird seat on certain types of data, maybe because of your professional relationships, who your customers are, maybe because you create that data by actually working in the community that you're working. Nobody else can see that. That's your edge.

But, at the same time, you have to be just as fast as them and just as agile as them and just as good as them at using all that other data. You don't get to slow down on the sort of commodity side of it. You've got to speed up on the innovative side of it and still get just as fast as everybody else on that commodity side. This is a really tricky dance.

Matthew Marolda: Yeah. You actually hit on something we talk about all the time, which is a cheesy line, but I've used it for years now, which is, "We've always wanted to be an innovation factory, not a warehouse." You're just going to fall behind.

Anthony Scriffignano: It's funny that you say that. Everybody wants to be agile, and they want to innovate. When you stop, and you say to them, "Well, what does that really mean to you?" they roll their eyes.

I say, "Look, realizing you had a problem that you didn't realize you had, is that innovation?" It sure is. "Taking a really big problem that you have and breaking it down into smaller problems that you still haven't solved, is that innovation?" Well, that sure is. But, that's called research. [Laughter] Cancer research works like that.

But, the problem is that most people want to immediately monetize that innovation often before they understand what it is or why it's innovative. I'm not suggesting that we don't rush to market. I'm suggesting we make new mistakes. That's a really tricky dance.

Matthew Marolda: Yeah. You've also mentioned another thing that we hold true, which is, fail fast.

Anthony Scriffignano: Yeah.

Matthew Marolda: I'm totally fine failing. It's okay. [Laughter] We'll make lots of mistakes. As long as you learn from them and learn from them quickly and then adapt quickly, that's how we handle it.

Anthony Scriffignano: I agree, but I think some people--and I'm certainly not accusing you of this--take this fail fast thing a little too far. They think it means something like try whatever as long as you fail fast. It doesn't mean that at all.

Matthew Marolda: No.

Anthony Scriffignano: It means that you better be able to explain why you thought that was the right way to go there are no do-overs. The time you spent failing fast, we lost the opportunity to go do the right thing.

Matthew Marolda: Correct. Yeah, absolutely. I think our whole thing is these riffle shots where targeting is very precise shots.

Anthony Scriffignano: Yes.

Matthew Marolda: We don't always necessarily know. [Laughter] We don't even know our ammunition sometimes, right?

Anthony Scriffignano: Yeah.

Matthew Marolda: To your point, it's a very thoughtful process, and it's very considered, but it is one where we don't want to linger on something there. I guess the other side of that equation is making sure that we don't become beholden to some idea just because we had it. We have to have a willingness to let it go.

Anthony Scriffignano: Yeah, that's a really tough thing to do, and especially in data and analytics. I think this whole tools thing comes up a lot where you have to hire people or bring on people that want to use the latest tools. They want to use the latest methods because they're cool and because they're fun.

I never want to lead with a tool. If somebody says, "Well, is there any way that we can use neural networks to--?" I say, "Stop right there." [Laughter]

If your objective is to use neural networks, then go do that on Saturday. I want to understand what the problem is. If that type of method, a neuromorphic method is appropriate to that type of problem, then we'll have that conversation.

If the carpenter walks in, a contractor, you're having your house modified and he says, "I hope I get to use the new hammer," you don't want to talk to that guy. You want to talk about the bathroom you're trying to build.

Matthew Marolda: Right.

Anthony Scriffignano: I definitely have to turn the conversation upside down more than half the time.

Michael Krigsman: I have a question. Matt mentioned taking a rifle shot. I think it was Matt who said that. Can you each describe the size of the datasets? When you talk about taking a rifle shot, what does that actually mean?

Matthew Marolda: Sure, since I used this somewhat crass example, [laughter] I'll at least defend it a little bit or at least elaborate on it. When we think about rifle shots, what we're trying to do is use these collections of data. Our data is, again, because we're not generating first-party data; we're absorbing it from many other places, whether it's from activities we run in a market where we're actually spending media and buying advertising, or whether it's taking data from publicly available sources like a Twitter, even, or Reddit.

What we're trying to do is sift through that and use tools that'll help us to highlight the insight. That's actually almost the language we're using to look for these insights, these things that'll tell us something. For example, men of a certain set of interests, shared interests, respond to a certain piece of creative, as we call it, so maybe a trailer, a 20-second TV spot, or whatever it is, in a certain way. That tells us something.

The rifle shot would then become how do we then make more creative like that? How do we find more people like that and target it at them?

The fail fast is, we want to learn as quickly as possible because our campaigns, we're spending an enormous amount of money over very short periods of time, so five-six weeks. We need to understand very quickly, did the insight we have lead to the outcome we expected? That's what we're trying to do. Once we've taken that shot, so to speak, we'll quickly understand did it work or did it not.

We try to also contain it in a very small area. For example, we might find people who fit the phenotype we're looking for, but a sample of them. A large enough one to understand that our approach is working, but not so large that we actually affect the campaign. Once we see that, then we accelerate. That's at least an example of how we do that.

Anthony Scriffignano: Yeah, so that's a really interesting way of describing a way of looking forward, while looking backward, very quickly at the yellow line right behind the car. You're sort of doing a combination of unsupervised learning. I don't want to start to get into all the methods and the names. There is actually a name for what you just described.

It's a really, really powerful way of thinking and totally appropriate to the environment that you're in. If someone led with that method to understand fraud, for example, I would say, "Well, you can't do that." What we have to first do is figure out how much of fraud.

First of all, it's not even fraud when it gets presented to us. It's the proto-fraud. It's the thing that precedes the fraudulent activity where somebody else loses money. Then they lie to us.

We do have years and years and years, decades of experience, more than decades, of that kind of behavior. But, that behavior is inherently changing with cryptocurrency and with new ways of cheating on the Internet, all kinds of cybercrime, and so forth.

Now, what we have to do is we have to say, "What percentage of this problem do we think is the new behavior versus the old behavior? How would we know when the environment is changing in such a way that the preexisting methods are not performing as well as we thought they were? And, what would be the triggers against something that we won't recognize when it's happening?"

If we use this analogy of driving and looking at the yellow line down the road, some methods look way behind the car, and they just look at the yellow line. They assume that the shape of the road in front is going to be just like the shape of the road behind us. We all know that's not true.

Other methods try to look only at the line in front of the car. Then, depending on how far ahead they're looking, they either miss the thing that comes right out in front of the car, or they miss the thing that's very close to the horizon that would indicate a change in direction.

You have to have a mixed methods approach that does a little bit of all of these, and that rifle shot, I think that you're talking about, what I'm imagining is almost more of a shotgun kind of rifle that it's shooting in multiple directions, but sort of in the same general direction. It's a very good analogy if you think of each of those pellets being a different method and a different analytical approach or a different type of curation, looking for different types of signals that may never have existed before. I could see that being super powerful.

Michael Krigsman: Let me ask you both two things in relation to this. Number one, how large are the data sets on which you operate? Number two, given the size of these datasets, how do you figure out which is the right target to aim at?

Anthony Scriffignano: Let me take a shot at that. It's really hard to answer the question, "How large are the data sets in this day and age?" Do you answer that in terabytes or petabytes? Do you answer that in numbers of entities? Do you answer that in terms of the rate of change?

I'll give it a shot in my world. There are about 300 million businesses in our databanks. Just to give you a rough idea, there are about 27 million or so businesses in the United States. About half of those change in a year in some way in terms of identity. We update this data from every country on earth except for North Korea and Cuba. We do it more than ten million times a day.

All of those different countries have different writing systems, different regulations. There are laws about what data can cross the border, what data must stay in the country, where you may fabricate products, where you may not. We have to comply with all of that everywhere while those laws are changing.

If you think of hundreds of millions of entities, you've got several thousand times that, tens of thousands of times that pieces of data producing that end product. We have to start to go into powers of ten to get to this. The number of things you need to look at when you start looking at relationships on top of pieces of data, on top of entities is in the order of 10 to the 24th in my world.

Michael Krigsman: Okay. [Laughter] That much data, like a big amount of data. Okay. Matt, how about you?

Matthew Marolda: We were geeking out on data. I want to hear how big his world is. [Laughter]

Michael Krigsman: [Laughter]

Matthew Marolda: I'm just going to sidestep that. [Laughter]

Anthony Scriffignano: [Laughter]

Matthew Marolda: It's actually interesting. There are two ways to look at it. One is, data is everywhere. Data is our reaction in a movie theater. There is data there. We just don't capture it very well or at all. There's data that's being discussed online. There's data in ticket sales. These things are enormous.

We have a relatively small slice of that. Still, I don't think we're at 10 to the 24th, but we have an enormous scale. I think one of the things that I was going to highlight from our point of view, because so much of what we do is from unstructured data, it's almost this odd concept. I don't like using the term "create out of data," but what we're doing is taking all this unstructured data and turning it into more structured insights.

For example, if you take a pool and just make it simple. Take all of Twitter, which we have access to and we use all the time; we have on our servers. Just Twitter alone can generate enormous amounts of structured data for us. It's almost infinite because, depending on what angle we decide to go into that data and pull it out, we're going to have a whole new set of things you could be examining.

We have many, many examples like that. For us, it's as much about the pools of data and then drawing out from them these new structured pieces. Because data, by its nature, is unstructured, typically, that enables us to almost infinitely create data on top of it.

Anthony Scriffignano: Michael, no matter what we do, we're not allowed to say "big data" anymore, but these Vs of big data, the volume, the variety, the value, the velocity, the veracity, truthfulness, they always come up. Matt just talked about all of them, I think, except for truth, maybe. I don't know. Maybe you'd just assume it's all true. I can't do that in my world.

They never go away. Maybe it's not cool to talk about it anymore, but that never goes away. The value, I've got 1,000; we've got tens of thousands of sources of data. Almost every day, you have a conversation with somebody, "Have I got a dataset for you."

You just can't chase every single one of those leads, number one, because that's all you would do. Number two, because it takes you away from other things. How do you vet that? How do you understand where you should be going in terms of making this, expanding this circle while the things in the circle are so dynamic?

You mentioned the Twitter data. That's a great example. It's interesting. You refer to it as structured data, and I get it. There are hundreds of pieces of attributes of a tweet that are very structured and very well understood that people don't think about on the top, on the surface, so the profile of the tweet, the time and the date of the tweet, and so forth. Then there are those sneaky little characters in the middle, which is what they actually said.

There's this science of semantic disambiguation, understanding the intent of the speaker, who is speaking, who they're speaking about, how they feel, and what context. These are all undone pieces of work in data science. There's nobody that's going to say, "I do that perfectly all the time." Even if they do, certainly not against all languages that you might possible encounter. If they say all of that, they're lying. Even if they weren't lying, language is constantly changing, so that's a pretty big piece of simultaneously structured and unstructured data you're dealing with there.

Matthew Marolda: Yeah. To be clear, I was actually thinking about both sides of it, the unstructured part being the text itself.

Anthony Scriffignano: Yes.

Matthew Marolda: Frankly, for us, we're almost as interested in the images and the videos.

Anthony Scriffignano: Yeah, and I thought it was awesome that you think about it that way. To you, that's no big deal. To me, that would be a nightmare. That's not the world I'm in.

Matthew Marolda: Well, I will give you the nightmare that I would have about what you do, [laughter] and you probably find this especially on the fraud side. You have to anticipate what fraud is going to look like. You don't have the outcomes on the frontend, like you were saying with the car example.

I'd be interested in how; I'd just be interested one level down into that. How are you effectively building your model to get that dependent variable? What are you doing to understand what fraud outcomes might be, so you can better anticipate them or find them?

Anthony Scriffignano: Even though I've been using the word "fraud," in general, I try to use the word "malfeasance," or bad behavior. If you think about fraud, it's a type of malfeasance, so it's the material misrepresentation of information for financial gain.

When someone comes to us and lies about how long they've been in business or how much money they make or how many employees they have, that's not really fraud yet. That's just lying. But then, if that produces, let's say, a credit report that says they're a long-standing business with lots of employees and plenty of money, and then they go use that to either order goods and services or maybe to get favorable credit terms of something, that advantage that accrues to them is fraudulent.

What we have to do is, we have to have canonical types of malfeasance, bad behavior. So, we don't look for fraud. We look for lots of things that are under that umbrella.

The first thing, it's called progressive decomposition in data science. You take this big, squishy term of fraud or malfeasance, and then you break it down. We look for identity theft. We look for bust-outs. We look for trade rings. We have different things that we look for, and those are sort of the traditional ways--I'm trying not to use unkind language--doing bad things to other people.

Then, we model. I don't even want to say "model." We build algorithmic approaches for detecting those sorts of things in new ways that might be enabled by things like, let's say, cryptocurrency or, let's say, virtual companies that get formed, things that haven't happened before. That takes care of all the new ways of doing the old things and the old ways of doing the old things.

Then, we've got new ways of doing new things. Now, we have to sit in a room and ideate on what types of bad behavior might be enabled by, let's say, the Internet of Things or the whole, you know, everything is connected to everything today. If I look at the Internet of Things, and I look at autonomous devices, as those devices become increasingly disconnected, they have to engage with each other and do business with each other without human interaction. Well, there is a new kind of fraud there, and we have to figure out what that might be and then how we might see if it's happening.

Michael Krigsman: Matt, earlier, when we were talking about the size of the data sets, one thing I wanted to point out to everybody is that, in the past, you built your own hardware storage systems because you couldn't find storage systems that were capable of managing the data, as I recall.

Matthew Marolda: Yes. We didn't build the actual hardware itself. We built our own structure within it. You're absolutely right, what we found was that the traditional storage techniques had these fundamental problems. They led us to a situation where we either couldn't store enough because we were sacrificing storage for query time, effectively, like how quickly we could get the data out, how quickly we could analyze it, versus having to have massive storage, but taking too long to get the data out and analyze it.

We had this sort of conundrum that we had to find a solution to. We took some very unique approaches to the backend to enable us to do that. That's important for us. It's not just a parlor trick. It's something that is critical in that, when we're running these media campaigns, especially, where these are very concentrated spends, it could be over $100 million in the course of four weeks, that kind of pace.

In talking about iterations we were describing earlier in how we're trying to iterate, it means that we need to be able to access large amounts of data quickly and be able to iterate very quickly and say things like, even a simple thing like a score, a prediction, or whatever you want to call it for someone who is going to want to buy a ticket to a movie that we are marketing. The duality of having this huge data set with the constraints that those typically have in query time, forced us, out of practicality more than anything else, just to have a system that would enable us to do both at once, which is query quickly and store enormous amounts of data.

Anthony Scriffignano: Matt, since you asked me to go down one level, if I could ask you to go down one level, I'm just fascinated by this. You also have to factor in things that might be happening that weekend when the event happens, so there's a bad storm or there's some political event or something.

Matthew Marolda:Sure.

Anthony Scriffignano: First of all, you have to be aware that that intimidating factor has entered your data, and then you have to account for it in order to understand the difference between what you predicted and what happened and how much of that is model variation and how much of that is the unintended impact of some other event. Could you give me an idea? We have that all the time.

Matthew Marolda: Yeah. It's tricky, and you hit on something that happens to us all the time. Weather is certainly an example that happens, but there are other things that have happened even more commonly, which are things in the world, whether it's a terrible act that is somehow similar to what might be happening in the movie, or whether it's a similar movie suddenly getting traction success, the week or two before, we didn't necessarily expect.

There are lots of exogenous impacts. What we try to understand, as best we can, and it is hard, is what are these tradeoffs. Going back to that notion of audience, I've seen a lot of situations in these marketing scenarios, in movies in particular, where people get scared, like something like that happens. That there's going to be some problem that's on the horizon. People can see those things sometimes, or at least often.

Then they get panicked, and they start spending. They start spending broadly. They start going out and saying, "Let's spend more, more, and more across more and more people." They might have gone from a relatively precise definition of an audience to a very broad one like males over 18 or whatever it might be.

Anthony Scriffignano: [Laughter]

Matthew Marolda: Our approach has been kind of the opposite, which is, when we see something like that, homing in further, using the data information we have to get more precise because, for a movie to be successful, and people think of all the big ones, but there are small ones too. We produce those as well. For them to open up to $20 million on a weekend means you only need, in the U.S., maybe 8 or so million people to actually take the action you want. It's a relatively small conversion rate, which is great. That means, when there is something that's coming down the pipe, whether it is weather or, again, maybe it's competition, maybe it's some event in the world, whatever it might be, just homing in and finding that audience to be more precise has actually been a strategy that's worked well for us.

Anthony Scriffignano: One of the things that we do when some unforeseeable event or, let's say, unplanned for event is starting to impact the environment that we're trying to look at, we opine on what. I'm trying not to use your example in your world because of IP and my stupidity about your environment, but we try to opine on what the impact might be and then how we would see that in the data. What would it look like? Then, we go and very quickly look at the data to see if the types of perturbations that we anticipated are actually there, in anticipation, before we start to react to it. Do you do any of that sort of in the moment sort of stuff?

Matthew Marolda: We do as best we can. Unfortunately, I don't like this either, but a lot of people think of these movies as snowflakes; each one is very unique. There's an aspect of truth to that, but there are also things you can learn.

As these pieces come in, we do try to adapt as quickly as possible. But, there are such unique situations; it's hard. Sometimes these external events can be positive. They can actually accelerate movies.

Anthony Scriffignano: Which can accelerate your confidence in your model, and it may have nothing to do with your model, right?

Matthew Marolda: For sure. Yes. Yes, so we're fortunate we have these breaks. We'll have this intense period, and then we'll have a bit of a break, and then the next movie. We spend as much time; I try not to use "postmortem," but whether it's a debrief or some kind of rearview mirror examination, we try to get really into the weeds on that, so we can learn the next time.

Anthony Scriffignano: Yeah. I tried to introduce the term "post vivum" one time. It was not great. [Laughter]

Matthew Marolda: [Laughter]

Michael Krigsman: It sounds like one of the things that you're both doing is trying to take these riffle shots or, as Anthony said, rifle shots that are like buck shots, so multiple rifle shots at the same time, and then keep track of the results very, very quickly in order to do course corrections because, in Anthony's case, if you are wrong, then it means people doing bad things. That malfeasance can get through. In Matt's case, if you're wrong, then a lot of money is being spent very quickly and maybe ineffectively. Is that an accurate characterization of what's going on and how you think about it?

Matthew Marolda: For us, for sure. I think that's exact. My obsession is efficiency. My obsession is, can we be as efficient with these spends as possible?

There are some red herrings sometimes in our world. People think a lot about cost per thousand, CPM, being a sort of measure of some form of efficiency, but that's actually a bit of a red herring because there are certain situations where you're willing to pay a premium for a smaller audience that you know are much more likely to convert. And so, this notion of exactly what you said, we're trying to be as efficient with these spends as possible and get the highest quality impressions we could possibly find to get us to the best conversion rate possible.

Anthony Scriffignano: In my world, the machine, the amazing machine that actually finds the bad guys is a different part of the organization than me. What I'm trying to do is come up with new ways to add to that magic and stay ahead of the changing environment. The function I'm looking for, since we're bringing Latin into it, we call it neo-sophism. That's actually Greek. Are you learning something new? Are you adding to what you previously have known in a useful way, other than if you didn't do the thing that we're proposing that you add to your bag of tricks?

For us, it's more about new enabling capabilities. The actual volume of transactions that's happening on a daily basis to check these sorts of things is happening in the tens of millions. I don't even want to stop and think about how they dip into that and do that in real time. My trick is to not mess that up and to try to continuously help make that better.

We keep talking about malfeasance. This is only one small part of where we do innovation. That's the cooler part, so we talk about it a lot, but there are lots of other, more boring day-to-day kinds of things as well, quality measurements and so forth.

Michael Krigsman: We're almost out of time. Can you look into the future a little bit, not too far down the road, but can you share? I'm not trying to get trade secrets out of you. It'd actually be interesting to hear that from both of you, but that's not my goal. [Laughter] Can you share where you're going? What's the trajectory that's coming down the road that helps address limitations in some of the things you're doing today?

Matthew Marolda: Sure. I think the first thing that we're thinking about at all times, really, is the exogenous impacts, whether it's GDPR, or whether it's changing policies even in private companies. Those things are constantly coming. Our whole aim is to be as fluid as possible around those changes and to be able to have the wherewithal to adapt quickly.

Where things are going is very hard to predict. I think the premium on first priority data, the premium on being able to collect and gather data is only going to go up. And so, from our point of view, it's trying to get as connected to our consumers as possible.

Michael Krigsman: Matt, can you just describe briefly the strategies that you're thinking about to handle that exogenous data?

Matthew Marolda: Yeah, so the systemic changes, this is so cliché, but the only thing that's predictable about life is it's unpredictable. What we try to do is have approaches and methods that allow us to--and this is, again, not a great way of describing it, but--unplug from one thing and plug into another. If we need to shut off one spigot, we quickly shut it off. We're not going to ever take any chances and it's not worth it to us to be on the edge of any of these things.

What we try to do is build systems and approaches that are relatively agnostic to the data we've built them on in the first place so that when the data changes, we have an ability to swap. We also pride ourselves on at least trying to constantly develop these new things in anticipation of changes down the road. Those are at least a couple of the approaches we use.

Michael Krigsman: Fantastic. Thank you very much. Anthony, I don't know if you can share what you see coming down the pike at all.

Anthony Scriffignano: We're focused in four areas that I would highlight. One is the connections between things, so relationships are becoming increasingly critical, not only understanding the entities, but understanding how they're interacting with each other. We're doing a lot of work in that space of understanding these regions, these lumpy regions where new types of behavior converge and then dissipate. What happened there, and how does it impact total risk and total opportunity? Language is always going to be important because it's always changing, so the unstructured data synthesis. The semantic disambiguation, understanding who is speaking about whom, how they feel, and why that's always going to be part of the canvas on which any of the stuff that we're doing, which is all unsupervised methods with new types of data, is always going to be painted on that sort of a canvas.

Convergence, where we take different disruptive technologies like I used the example before of the Internet of Things and cyber. You could pick any other two or three technologies that are disrupting the world of data science. Put them together, and you get something people forget to think about sometimes because the experts on one side of the fence and the other side of the fence are too busy trying to figure out what's on their side of the fence, and they don't talk through that fence very well. We're always looking at how the convergence of different disruptions in the world of data science might be causing either new risks or new opportunities. Connected space, convergence, and language would be my three.

Michael Krigsman: Okay. Then, just to finish up because we're just about out of time, let me ask you both. Maybe, Matt, I'll start with you. I'll ask you both to share advice to businesspeople who are working with data scientists, working with data and analytics on how to work with you guys most effectively.

Matthew Marolda: I'm going to turn your question 180, first, and then I'll answer it. The 180 I would turn is how I've at least observed over the years for people like us to interact with people who are coming in from the search perspective, right? It starts with humility. This is what I've learned, and I've seen. It starts with being humble about it and not coming in with this sort of combination of self-righteousness and self-importance that sometimes I've seen pop up.

On the other side, when you go the other direction from where you're really asking, which is, how should people look and receive this information? I think there are a couple of things that I've kind of observed. One is having an open-mindedness to it. Accepting that sometimes outcomes and learnings from these approaches may just confirm what you already know, but that's an important outcome to be able to understand that, yes, your intuition was correct. That's supportive, and it should actually give you more confidence.

My experience is, the majority of the time, you're actually confirming what someone already intuitively new. We've tried to use those as opportunities to build the buy-in to then when there's that 20%, 30%, 40% of the time where you're finding something that's completely surprising, that that surprise has credibility back towards the intuition that the person started with.

Anthony Scriffignano: I was enthralled with Matt's advice. I thought it was awesome. I would say three things. The first thing is, I would say that when you're working with data and analytics folks, make the conversation about the problem or the question. Don't get distracted by the methods and the tools. That comes later.

If you take a really horribly articulated problem and really bad data, and use beautiful visualization and amazing tools, it just looks more correct. You're putting lipstick on a pig. You've got to get the question right. You've got to get the problem right. And, you've got to spend enough time making sure that you're doing that so that all of this work that you're doing is in the right direction.

The second thing I would say is to make sure that when you're working with any group, really, but particularly with this sort of a group, that this is something they're doing with you and not to you. You've got stay in that game, and you've got stay part of the evolution of the thinking. Don't let them get all confusing with terminology and tools. You've got to stay focused on what it really means and what the impact of that will be. Stay in the game. Don't just wait for the answer and like it or don't like it.

The last one, I'm just going to echo what Matt said. Humility: you've got to be very humble about your own capabilities, about your own ability to see the question or the problem, about how much of it can realistically be addressed, and really how you would know when all of your assumptions aren't really valid. Don't get all excited about how great it looks and how good you look in the mirror. Make sure that you understand the limitations of what you're doing and how you can always make it better going forward.

Michael Krigsman: Okay. I love it. Well, this has been a very fast 45 minutes, and I wish we could continue the conversation, but what an illuminating look into data, analytics, and opening that black box. I want to say a real thanks to Matt Marolda and to Anthony Scriffignano, and I hope you'll both come back and join us again another time.

Anthony Scriffignano: I'd love to.

Matthew Marolda: Yeah, that'd be great.

Anthony Scriffignano: Thank you.

Michael Krigsman: Everybody, you've been watching Episode #294 of CxOTalk. Again, tell your friends and family. Don't forget to subscribe on YouTube. We will see you soon. Bye-bye.