Enterprise Decision-Making with Anthony Scriffignano, Chief Data Scientist, Dun & Bradstreet

How can we use data to make better business decisions? In this episode, we address this question with Anthony Scriffignano, the Chief Data Scientist at Dun & Bradstreet.

44:11

Feb 12, 2016
2,552 Views

How can we use data to make better business decisions? In this episode, we address this question with Anthony Scriffignano, the Chief Data Scientist at Dun & Bradstreet.

Anthony Scriffignano has over 35 years experience in information technologies, Big-4 management consulting, and international business. Sciffignano leverages deep data expertise and global relationships to position Dun & Bradstreet with strategic customers, partners, and governments. A key thought leader in D&B’s worldwide efforts to discover, curate, and synthesize business information in multiple languages, geographies, and contexts, he has also held leadership positions in D&B’s Technology and Operations organizations. Dr. Scriffignano has extensive background in linguistics and advanced computer algorithms, leveraging that background as primary inventor on multiple patents and patents pending for D&B.

Scriffignano regularly presents at various business and academic venues in the U.S., Europe, Latin America, and Asia as a keynote speaker, guest instructor, and forum panelist. Topics have included emerging trends in data and information stewardship relating to the “Big Data” explosion of data available to organizations, multilingual challenges in business identity, and strategies for change leadership in organizational settings.

Scriffignano also confers with key customers on emerging trends in global data science. He was profiled by InformationWeek in a special coverage series “Big Data. Big Decisions” and by BizCloud regarding big data problem formulation and data privacy. He was also published in the May, 2014 edition of CIO review (“The Future Belongs to the Informed”). Scriffignano has also held senior positions with other multinational organizations. This experience includes extremely large ERP implementations and worldwide organizational change and technology adaptation efforts. He has advised firms in financial services, manufacturing (chemicals and pharmaceuticals) and information technologies. He maintains CPIM certification from APICS, the internationally-recognized Association for Operations Management, in production and inventory management.

Transcript

Michael Krigsman:

(00:02) Welcome to episode 156 of CXOTalk. I’m Michael Krigsman and today I am joined by Anthony Scriffignano, who is the Chief Data Scientist at Dun and Bradstreet. Anthony how are you today?

Anthony Scriffignano:

(00:23) Hello Michael, how are you. I’m doing great.

Michael Krigsman:    

(00:25) Hey listen, thank you so much for taking the time.

Anthony Scriffignano:

(00:28) Not at all it’s my pleasure

Michael Krigsman:

(00:30) Anthony let’s begin by tell us some background about Dun and Bradstreet. It’s an interesting company and it’s been around for many many years.

Anthony Scriffignano:

(00:39) Yeah, it’s a fascinating company to me anyway and I think to many people. It’s been around for we’re at 174 years now so it started before the civil war. And it’s been through many many iterations over the years. The company has between 4-5000 employees but then we also have a worldwide network, partner associations around the world so it’s a pretty big company.

(01:05) Most of our customers focus on problems in the area of either total risk or total opportunities, so think credit and also sales and marketing. And then some of the related issues like you get like compliance and government relations onboard and customers and things like that.

Michael Krigsman:    

(01:19) So very quickly because I’m curious about this. You started before the civil war and I know that a number of presidents have actually worked for the company, including Abraham Lincoln, so what did Mr. Dun and Mr. Bradstreet – I’m assuming they were misters do before the civil war?

Anthony Scriffignano:

(01:38) Well they were, so if you think about what was going on at that time, so you had Westward expansion and you had a lot of businesses on the East coast that was trying to do business with people who were increasingly far away. And it got to the point where you could go to visit them and judge the character and quality of the person, or how real they were, or weather their operations appeared to be significant enough for you.

(02:02) So they wanted people who could essentially be their representative and forming those opinions and that’s how this all got started is help me understand people I can’t see. And that’s pretty much what we do now instead of try to deal with the two month stagecoach ride it’s the two second trip over the internet, but it’s the same problem.

Michael Krigsman:

(02:21) So that two second trip over the internet comes down to data analytics and data science. So in a sense back when the company was founded there was the transfer of information as you said over stagecoach, and then there was some type of analytical method you used to evaluate the risks. Now you use data science and you’re the Chief Data Scientist, so what does that actually mean in this context?

Anthony Scriffignano:

(02:51) Well you know it’s the joke would be how hard can it be right. The issue is that as you try to make a decision, let’s take ourselves back to pre-civil war day right, what you would look at to try and make a decision about whether a business is ‘worthy’, right.

(03:10) The first thing is are they real. And then you ask some questions about like how long have they been around, what kind of business are they in. well we do the same kind of thing but when you think about data science and think about the literally millions of sources of data that are potentially available to make such a decision, how do you decide what’s true, how do you decide whether what you’re seeing is what it appears to be. How do you find that very small, very new business that just came into being?

(03:36) What happens when a business has a name, or address, or a phone number or any kind of physical presence that’s in some way transient or virtual. So the questions are really the same kinds of questions but the data science version of it is how do you use new types of data as opposed to just places where you can go and look.

(03:56) It’s a very similar problem but much more obviously much more algorithmic, much more automated, much more ‘scientific’.

Michael Krigsman:

(04:02) So how do you use data to determine ‘what’s true’, that’s the question you ask.      

Anthony Scriffignano:

(04:11) That’s a big question, so if you think about what true means sometimes that’s relative right. Suppose the question is, is this business out of business? That seems like a very binary thing, either they’re out of business or their not out of business, well not really. When you look at a very small business they’re not necessarily going to go bankrupt, they’re not necessarily going to call us and say by the way, we’re going out of business now. They’re not going to put a notice in the newspaper. There’s not going to be any kind of press release. There’s not going to be anything.

They just stop, and then what if they’re just resting for a while. What happens if a small business is actually still in business but the proprietor of the business is just doing something else; he’s sailing around the world for a year, or he’s in the hospital or she decided to go and do some other business for a while and she’ll be back right.

(05:00) So we have the versions of kind of park as opposed to definitely gone out of business, and that’s a very nuance kind of thing. So how do you figure that out with a stream of data? Obviously you could look at suits, leans, judgements, business deterioration. Look at those things as precursors to businesses that really die as opposed to things look like they were going well and all of a sudden they stopped.

(05:24) You might look at the type of business that we’re talking about. You might look at the location in the world. You might look at the owner of that business in the context of the business and see if you see them popping up elsewhere. There’s lots of different signal you might get in a situation like that.

Michael Krigsman:

(05:37) So explain how you go about as a data scientist how you go about analyzing some of these problems.

Anthony Scriffignano:

(05:46) Well let’s take the issue of fraud as a great example. So fraud when we talk about it we think we know what we mean but everybody means something different. So fraud by any definition around the world is some sort of misrepresentation of information for financial gain. When people lie to use they haven’t gained anything yet, so is that fraud; we call it malfeasance sometimes.

(06:10) If you think about the problem of fraud in the context of how you see it in data, or even how you see it in real life, it’s often referred to as a quantum observation problem. When you observe it changes. So people committing fraud behave differently when they know they’ve been detected.

(06:28) and so to try to use regressive methods that only look backwards at pre-existing data and pre-examples examples of fraud you’ll get very good at catching the things that used to happen, which is counter intuitive because the thing you’re looking at you’ll know it’s changing. So data science would say yes do that because it’s not going to completely stop but it’s also but not sufficient you need to do more. So how do you find types of bad behavior that haven’t occurred yet?

(06:57) Well the first thing you do is you look for types of behavior you haven’t seen before and then you try to vet those behaviors against behaviors that are known to be maleficent to try to see if there are similarities. And data science provides non-aggressive methods that do thig like that, with the connected space, what we call dyadic relationships. Relationships among multiple parties and looking for observable relationships that are different from the ones we’ve seen before and then that allows us to focus and address a problem like that. So it’s a very long way of saying, you start looking for things that are new and you start to unpack them and see what they tell you.

Michael Krigsman:

(07:33) But you’re doing more than simple comparisons and in a sense if I can incorrectly boil down what you just said was, you compare that with what we don’t know to that which we know.

Anthony Scriffignano:

Exactly.

Michael Krigsman:

(07:52) But that seems a fairly trivial observation so I assume that the data science part is quite a bit more involved than that.

Anthony Scriffignano:

(07:59) Yes absolutely, so the part we don’t know is where the challenge lies right. You have so much data in front of you and you have to make a decision which parts are you going to look at and which parts are you not going to look at. There’s a huge opportunity cost to make a decision like that. You can’t just bring in all the data and keep pressing the learn things button right. So every time new data becomes available there’s a step of discovering, realizing that it’s available.

(08:28) There’s the step of curation, making a decision and about whether or not you bring it in and if you did what would it mean, and by the way are you allowed to bring it in; do you have permissible use things like that. And then there’s the synthesis, making sense out of that. And that all sounds easy until you try and do it at the scale of the creation of information which is off the chart.

(08:49) There’s so much information being created right now, that we’ve actually lost the ability to measure the rate at which it’s increasing, not only don’t we know how much information there is we don’t know how fast it’s growing anymore.

Michael Krigsman:    

(08:59) Okay, discovery, curation, synthesis, can you give us a concrete example from your work that ties these pieces together so that we can understand the data analysis process that you go through in order to learn something new from the data that you didn’t see before.

Anthony Scriffignano:

(09:22) Sure, so let me give you an example that seems obvious that’s not. Let’s suppose that we’re trying to understand how a company represents itself around the world in different languages and different writing systems. So you might think that you might translate, but translating works really well for common nouns but it doesn’t work very well for proper nouns. So if you have your own name how do you represent that in Arabic or Chinese. Those are decision that you have to make, and they involve sound and the interpretation of maybe the symbols you might use or how those sounds sound in different languages. Different languages have different phonetic palates.

(10:01) My name Scriffignano has a GN sound in it the (neah) that’s not an English name. So when I tell people how to say it I say well say lasagna because you already know how to say that right. so that’s a sort of technique right.

(10:14) So how do you now discover the presence of an organization or a person in different parts of the world when they’re represented differently. You can’t just sort of flip the letters around, especially when we’re talking about different writing systems. So one of the things that you do is you ingest a very large corpus of information that you understand.

(10:37) So you might ingest something like think about maybe a chamber of commerce might produce a listing of the directors and officers, CEOs and owners of businesses. So now I’ve pulled in, I’ve found a listing of a whole bunch of names and I have let’s say, a listing of a whole bunch of names. The curation is trying to make a correlation between those two saying, how much of this thing that I’ve just ingested that’s in a language I don’t know can I understand from the sort of the context that it’s in.

(11:07) And then the synthesis is can I discover any rules. So I’m just thinking of an example, in Greek they have the letter chi, which sort of looks like an X. that sound doesn’t really exist in English; does that turn into a CH or does it turn into an X or does it turn into a K. And those three different decisions will lead you down a different path.

(11:28) So now once I have that question, is it CH, X, or K, now I can start to look at the data and say which seems to be more appropriate and over time I can develop rules and then over time those rules can form new processes. I can tune those processes. I can do what’s called heuristic analysis, where I get a group of people to observe what the machine is doing and see whether they agree or disagree and you tune these things over time and eventually it sort of approaches the collective experience of a person doing the same thing. There’s a thing called the Turing Test that you might be familiar with. That’s the ultimate example of that and at what point does it appear to you to be intelligent.

Michael Krigsman:

(12:08) So at what point does it appear to you to be intelligent? At what point do you make the decision that all of this analysis, this normalization of multiple data streams, all the analysis that you’re doing that you’ve done enough. And now, based on that analysis you actually do know what is ‘true’.    

Anthony Scriffignano:

(12:31) So true is a very dangerous word, but what we’re looking for is we’re looking for something to converge on a groups of in the case of heuristics what is a goal standard is a group of similarly instructed, similarly incentive people.

(12:48) So you look at a large enough collection of information and you make sure that you ingest and interpret that information and the same as a group of people who are similarly instructed and all have the same to gain or lose. You can’t have like 10 experts and five interns; they’ve got to be kind of the same.

(13:04) And then there’s techniques for normalizing for optimism, and pessimism, and for fatigue and things like that. And eventually what you’ll get is not something necessarily always true, but we like to use the phase that consistently wrong is better than inconsistently right. Get to something that’s consistent that you continue to tune as you understand in how it behaves and you either like or don’t like what its doing.

Michael Krigsman:    

(13:28) So the first step then is to aggregate a large amount of data in what we commonly hear the term big data.

Anthony Scriffignano:

(13:38) I would say the first step would be to become aware of the data that could potentially be aggregated.

Michael Krigsman:    

(13:46) So what does that actually mean?

Anthony Scriffignano:

(13:47) Don’t try to eat the whole salad bar. Don’t try to take everything in. look at what’s available and decide what you’re going to have for your salad and have a reason for deciding that.

Michael Krigsman:

(13:58) So you have to be clear about the problem that you’re trying to solve?         

Anthony Scriffignano:

(14:00) Exactly, it really goes back to you never lead with the data and you never lead with the technology, you lead with the problem. Now there are times where you might pull in the data and say what can this data tell me, but in general for a business problem you should start with the problem. You should start with what’s the real thing you’re trying to do.

(14:19) I have used example with you of discovering fraud, or finding new businesses, or discovering when businesses have died. Those are real business problems. You start with the problem and then from there you look at the data. There’s the set of data there’s the data that you already have, the data that you could go out and discover, and the data that you’re never going to get to. And you have to evaluate the relative size and importance of those three classes of data against the problem that you’re trying to solve.

Michael Krigsman:

(14:45) So we hear this buzzword big data all the time, what does big data actually mean in the context of your world and as a data scientist who is looking at these large blocks of data or aggregations of data in a more rigorous way. So I guess compare big data as a marketing phrase versus a large volume of dat. And I’ve heard you also you use the term smart data in making this comparison.      

Anthony Scriffignano:

(15:18) Yeah, I can only define smart data juxtapose to big data, so let me take the first predicate in your question first. So big data, you know we jokingly refer to it as mmm now because you’re almost not supposed to talk about it anymore but it hasn’t gone away.

(15:34) Big data is described in many different ways. What I try to do is describe it very formally and very empirically and very consistently. So you’ll hear me say that you have these aspects of volume, velocity, veracity, variety, and value, the Vs.

(15:50) And you have a big data problem when those Vs overwhelm the best attempts to deal with them. That doesn’t mean you’re too cheap to hire the right people or you have the wrong technology. But when you throw the best of the best at it and you’re still overwhelmed by one or more of those Vs, now you have a big data problem.

(16:07) So it’s not just having a lot of data. It’s not just having data that’s changing really quickly. It’s not just having data that some of it’s true and some of it’s not and you can’t tell the difference. It’s all of those things and more or less at the same time, and when they start to overwhelm the system, that’s when you have a big data problem.

(16:23) Smart data, some people use that term to differentiate between the big data and the smart data. The smart data is the subset of that data that will actually apply to your problem that can be used intelligently in a way that takes you towards a solution.

(16:39) And I would add to that definition, it doesn’t necessarily have to take you towards a solution. It could also take you towards breaking a large unsolved problem down to a smaller problem that’s still unsolved.

(16:50) Think about like curing cancer right, you may not cure cancer, but you may say all right cancer has nothing to do with the color of your blood, moving along. So you’ve taken the problem and made it smaller.

(16:59) And the other thing about that journey is there might be data that uncovers a question that you forgot to ask before. So we’ve been focused on are there planets outside our solar system and we kind of decide that there must be you know, logic and histology says there has to be. But until recently we couldn’t prove that there weren’t any exo-planets. Now all of a sudden we have tens of thousands of exo-planets that we know about. So the next question along the way is well do any of them look like ours? That’s not just the only next question because someone could say, what’s so special about looking like ours. You know might they look like something else and still be of interest.

(17:39) So you get these two classes of people, you know one class is looking for water and the other class is looking for a certain planetary mass. That’s first asking a question you forgot to ask and then taking that question and breaking it down into a smaller question that’s still unsolved but it’s moving you towards an answer.

Michael Krigsman:

(17:59) We have an interesting question from Arsalan Khan so…

Anthony Scriffignano:

(18:04) Nothing to do with exo-planets I assume?

Michael Krigsman:

(18:08) You know I suspect you may be able to make a linkage here. But I’ll let you do that one. So you mentioned this concept of truth is a rather tricky concept and there is no ground truth necessarily and so he’s wondering, you as a data scientist come up with your conclusions and then an executive company looking at those conclusions say now way, that’s not a chance. Your data’s wrong because that’s not the truth of the world. The truth of the world is this over here, and what do you say to that?

Anthony Scriffignano:

(18:51) Well first I say that I start by saying here’s the truth that I’ve discovered and then I deserve that kind of a reaction. So data science is about the data part but it’s also the science part. And we have this thing called a scientific method, so it means that we observe the world around us. We form a hypothesis about that world. We ask a research question. We look at what literature is out there, like what everybody else has done first. We then pick a method to answer our question. We prove that that’s the best method.

(19:23) Then and only then do we go out and collect some data and use that data according to our method to answer our question. We talk about the answers that we’ve concluded. We talk about the bias in those answers, the weaknesses of it and we support our answers. And then if we’re really good we answer questions for future research.

(19:42) So if we did all those things, I don’t just go to the leadership in my organization and say, I think this data proves that there is life on other planets. I go to my leadership and say, I go and I say I asked myself the question, is there life on other planets? I said well life as we know it right now is based on water and some other things. So what I did was I looked for evidence of water. Here’s how I decided to look for evidence of water and looked for hydrogen and oxygen and whatever you do. And here’s what I found and here’s what I think it means.

(20:11) Now if you disagree with me, tell me what I think I got wrong. Did I get the wrong question? Did I understand the data wrong? Did I use the the wrong method? And if they can answer any of those things and if I’m good scientist I should be able to respond to those things. That’s called defending your hypothesis, right. If I can’t respond then I’ve done bad science and then shame on me.

Michael Krigsman:

(20:30) So you’re one of those tricky ones, because yes you know I playing the role of an executive, I hear everything that you’re saying, I see your data and yet looking at that planet it sure looks like it has a pinkish cast to me. And infact I know that it does and I’ve been working with planets that have a pinkish cast, or sets of data like this one my entire life I know this population. And you’re now telling me something through your scientific methods that contradicts firm beliefs of how I see the world and I know the way the world works. What about that?

Anthony Scriffignano:

(21:23) So Michael let me respect your knowledge of pink planets. I really appreciate your observation and your experience and I’m certainly not calling you wrong because what you believe is what you believe. Help me understand how you’ve come to this opinion about the relationship between life and pink planets.

(21:42) And pretty soon what’s going to happen is you’re going to be saying, ‘it just is, it’s in my experience’ and I’m not calling you wrong. I’m asking you to help me understand why you believe what you believe.

(21:55) So if we’re really going to be scientists then that means we have to be open to conflicting opinions and if we’re open to conflicting opinions, those people who have those opinions should be able to defend them.

(22:07) Now sooner or later you could say, look I’m your boos and what part of I’m your boss did you fail to understand and go back and prove my pink planet hypothesis right. If you go and tell me what to go prove, you’re basically asking me to engage in bad science and now we have a whole different problem.

Michael Krigsman:

(22:24) Boy it sure sounds like a lot of businesses I know.

Anthony Scriffignano:

(22:26) Yeah it does. It sounds like a lot of them I’ve worked with, fortunately not the one I work for right now with not at all. But you also have to be very careful. You can be right and dead, and part of being a good data scientist is being able to use what you’ve learned to tell a story that credibly approaches a problem that somebody has. You can’t just walk in and say, ‘oh look I used all of these great methods and look what I learned and you should bow down to the data’ – absolutely not.

(22:48) You have to understand the problems that people are trying to solve. You have to understand how you can be relevant in the context of those problems. You can’t always do all those steps I articulated because time, and money, and reality are going to get in the way sometimes. So you have to be reasonable and practical. But by all means you have to be empirical. You have to do something that you can repeat. You have to do something that you can defend.

(23:12) You should never use the fact that someone’s in a hurry or shouting loudly to go and do something completely irrelevant or negligent. You have to be very careful. There’s a lot of solutions out there that will let you just ingest a ton of data and push the magic button and reach some kind of a conclusion. And that’s great; I mean sometimes that’s all you have. You have no idea what this data means, but at some point you have to do better than that.

(23:39) A great example is if you just didn’t know anything about playgrounds, and you drove past a playground and you saw a bunch of kids playing in the playground, you might initially conclude that this is chaos; it’s just a bunch of kids doing stuff.

(23:52) If you observe the playground more closely, you would see a baseball diamond or a football field, you’d see lines. You’d see things that imply some sort of structure, and you might if you looked closely see playground monitors. You might see people there that are enforcing rules. You might see the little boys and the little girls are doing different things, they’re playing differently. You might start to uncover behavioral aspects.

(24:15) By using correct observational techniques and being careful about what you see, you’d learn mora about that playground. Now you could yell and scream and say I’ve looked at playgrounds all my life and you never, you absolutely never see business being conducted on a playground. And I say, ‘well that’s great. But what about those two guys in the suits over there pointing at the foundations on that Jungle Jim’, ‘oh well those are contractors, they’re not kids’. ‘Well you didn’t say they weren’t kids, you were talking about playgrounds’. You know we’ve got to make sure that we understand what we’re saying to each other.

Michael Krigsman:

(24:47) So you were talking earlier about connected spaces, and people and relationships, can you elaborate what is a connected space in this context.

Anthony Scriffignano:

(25:02) Well great question, so you have to be very careful when you use a term like that that you know what you mean. Things can be connected in many many different ways in even defining what a connection is is somewhat problematic.

(25:15) One of the things that we talk about and what I talked before is a dyadic relationship. It’s a relationship between two entities. So at Dun and Bradstreet we mostly talk about businesses. A connected relationship might be ownership, so you have a branch and a parent owner of that branch; you have a subsidiary and a parent. So you might have, we define it and one type of linkage we have is majority ownership. So if there’s a subsidiary that owns more than 50% or something that would be a type of dyadic relationship.

(25:47) Another type of dyadic relationship amongst business entities might be if they’ve sued each other or that they’ve mentioned each other on social media, or that they have co-collaborated in some observable intellectual property. Or that someone from one company is connected to another company on a platform like LinkedIn or Facebook or something like that. Those would all be types of discoverable dyadic relationships.

(26:14) And then the question is how can you observe all of those dyadic relationships and how they’re changing over time time to form conclusions about things, like maybe the business is growing, or that the two companies are collaborating, or that the two companies are advisories. Or that there seems to be some kind of fraud or malfeasant behavior going on. Those might be all conclusions that you might try to reach to observing those dyadic relationships. There are other relationships, there are more than one-to-one. They have different names and they have different problems and uses.

Michael Krigsman:

(26:45) Give us an example of something that’s really hard. What’s the hard kind of problem you face and maybe talk about it in a business context?

Anthony Scriffignano:

(26:56) So hard always is involved where people’s behaviors is involved. So fraud –we keep talking about fraud, and fraud is hard, because bad guys keep innovating while we’re innovating in how we detect the behaviour of the bad guys, and that’s a really bad problem.

(27:11) Another example that involves behavior is when businesses have connections to each other that are not formed through owning pieces of each other. So they form temporary relationships, alliances. They form groups. They form you know, lots of different words for you have to sue us separately; like we’re not part of the same thing, right. Really hard because they’re very squishy kinds of things and you think that involves behavior is not observing a strict set of rules that you can go and discover.

Michael Krigsman:

(27:46) So it’s when you either have the human element or you know that there are connections that exist but the companies have been structured to reduce or eliminate to the extent possible direct business connection even though they are related there.

Anthony Scriffignano:

(28:10) Yeah there may be intent like that and behind it or it might just be something that’s happening sort of organically. You know if you think about sometimes when there’s an external event like a flood or the Arab Spring or some you know major change in you know who’s in charge of the country or the region or whatever. All of a sudden in the business world you see a lot of shifting around and it’s not like everybody gets together and says okay how are we going to react to the fact that there was an earthquake in Taiwan. It’s just that there was an earthquake in Taiwan and now some businesses start doing work for humanitarian reasons and other businesses start seeing opportunities where they didn’t exist before. You get this, I don’t want to say chaotic, but atypical behavior, and to say that you’re going to model that behavior maybe if something very very similar has happened in a reasonably recent period of time against the same type of universe you might be able to do that. But usually all of those preconditions aren’t met, so you have something very squishy and it involves behavior and you have to respond to it, or choose not to; either one is a choice.

Michael Krigsman:

(29:20) How do you make the decision of which data problems to solve. Since you mentioned that’s the first question, how do you decide what’s a good data problem to be looking at?

Anthony Scriffignano:

(29:32) Well to be looking at and to solve are two different questions, I have to kind of park that but you know there are some guiding principles. So we have general guidelines – I call them foul lines. We don’t just do things because they’re fun or interesting or scientifically challenging. There’s got to be some real business frame for it.

(29:55) At the beginning I talked to you about total risk and total opportunity so at Dun and Bradstreet normally we look at things like that. We look at on the total risk side, it usually has something to do with are they going to pay, are they going to stay in business, are they going to commit some kind of maleficent act or are they going to in some way threaten some business objective.

(30:16) On the opportunity side it’s how big are they, how much do they look like my best customers, how much do they complement my best customers, what’s the white space in this industry. Those are all opportunity kinds of questions.

(30:27) So normally I would start from one of those frames, if somebody just said, ‘here’s this really cool language problem and you guys do a lot of computational linguistics, wouldn’t you like this’. Well look at Arabic, well yeah, of course I’d like to look at it but do I have any data to look at it and is it part of a problem that our customers have and would anybody notice if I made any progress there. And you know if the answer to any of those is no then you probably ought to move on and just keep an eye on this and come back to it later.

Michael Krigsman:

(30:57) A different question here altogether, what’s the relationship between data science, big data, artificial intelligence, machine learning? We hear these buzzwords thrown around and usually they’re thrown around by marketing departments, so from a data science perspective what’s going on with that?

Anthony Scriffignano:

(31:20) Well artificial intelligence and machine learning are tools that are used by data science. So some of these things that you hear about, neural networks and quantum algorithms and machine learning and those are all tools and techniques that can be applied in the field of data science.

(31:37) Data science is a complex combination of being able to understand the methods for understanding data scientifically and also using it to tell a story. And in the business world that story has to relate to a real problem that is meaningful to the population. So if you think about data science as the part where you use all those other things. And I would also add we often misuse all of those other things because you’ve either been tricked into using them by somebody who say that they’ll solve all of your problems or you are you know, hoping that is somehow going to be your silver bullet or you’ve been in a conversation that started with my favorite words, ‘why don’t we just…’, you know and we’ll push this button and everything will get easier.

(32:22) So you know there’s a sort of a dark side to all of this that you’ll go and use all of those tools and techniques without really understanding what you’re trying to do. That would be like me going into a hardware store or into a tool store and buying a laser saw – and I’m not a carpenter. Well that’s great, you have this tool and you know you’re trying to carve a pumpkin – you bought the wrong tool. You need to understand what you’re trying to do; you don’t just jump right for the tools.

Michael Krigsman:

(32:51) So data science is about telling a business story, is that the ultimate goal, your end objective in this sense.

Anthony Scriffignano:

(33:00) Using data to address a problem and to be able to answer that problem in a way that’s meaningful to the business. And what I would add, the science part is, in a repeatable defendable way. Many people would not add that last part. I would.

Michael Krigsman:

(33:21) So defeatable - repeatable

Anthony Scriffignano:

(33:26) It’s often defeatable as well but that’s another problem.

Michael Krigsman:

(33:27) Well if you go through the steps you’ve been describing then hopefully it’s less defeatable and more repeatable.

Anthony Scriffignano:

(33:36) Yes we should hope so yes.

Michael Krigsman:

(33:38) So what about innovation, so right now the Internet of Things, innovation around data seems to be where the future is taking us, may be give your point of view on that.

Anthony Scriffignano:

(33:53) Yeah so thanks for that. You know it used to be a couple years ago if you wanted to be a pundit and talk about technology and where we were going you had to say mobile, social, cloud, analytics. You had to get those four words out. And what I’ve been saying recently is that you know those four words leads to lots of other words. So you know if you’re going to talk about mobile you’d better talk about the Internet of Things right. Mobile technology is just sort of things that are out there and moving around, and the Internet of Things, some of those things move around and some of them don’t but they’re certainly out there and we may not necessarily know where they are or what they are when they’re talking and that presents a whole slew of problems.

(34:33) Just like if you talked about I don’t know, cloud computing you’d better be talking about data sovereignty and you know the different rules and regulations. You can’t just put data out on the cloud. Nobody thinks there’s hard drives’ floating around on the cloud right, that data sits somewhere.

(34:48) So your question was about the Internet of Things which to me is an extension of you know where we were a couple of years ago. And I think a lot of people think this that we’ve got a thing or two to learn in this space. If you look at Bluetooth from a number of years ago I think as a good analogy, Bluetooth was sort of invented and then it took about 10 years to catch on.

(35:09) And part of the reason was in my humble opinion was we forgot to think about a number of questions, like you know, it’s great that you can have a Bluetooth headset but how do we keep my headset discovering your phone and eavesdropping on it right. Well we’ll put this four digit passcode in there and nobody knows the number, so they’re always 0000 or 1234 and all of a sudden all you have to do is try a few numbers.

(35:30) So we’ve got to be better than that. With the Internet of Things we’ve got you know tens of thousands, hundreds of millions of things right now talking to millions of other things, and we’re sort of making those same mistakes.

(35:41) There was a big issue not too long ago. I won’t name the company but there was a doll and the doll could talk to a cloud application and your kid could talk to the doll and the doll seemed to know what was going on and it would get smarter as other kids talked to the same doll. That’s great except someone realized that that was a device on the Internet of Things that had an IP address, and if I can hack into it it has a microphone. And if the kid leaves the doll in the parents’ home office I can eavesdrop on the conversation and maybe short stoke or do things that are maleficent, and then that started to happen. Whoops didn’t think of that.

(36:17) So if you’re going to build something on the Internet of Things you’d better be thinking about how it might be used in unintended ways. You also better be able to think about what happens if it’s used in intended ways at a scale that goes way beyond  of what you ever intended.

(36:32) You also better be able to think about how other people might use it to solve unknown unmet needs. Somebody starts to use your thing to solve a completely different problem that you didn’t plan on it solving, and you didn’t build it for that purpose and now all of a sudden you’re negligent in a way that you didn’t even intend.

(36:50) We’ve got to be a lot smarter about this. We can’t just rush to say, ‘oh isn’t it great that things can talk to other things’, yes, but what might they say to each other and how might they all of a sudden help people do things we didn’t intend. Very big questions, we’d better be asking those questions.

Michael Krigsman:

(37:06) And as you’re asking those questions at Dun and Bradstreet, what are some of the answers or the points of view or the trajectories that you’re coming up with.

Anthony Scriffignano:

(37:19) Well so things themselves don’t necessarily play into our landscape right away, although some of those things might talk to us and ask about businesses. I won’t get into the complexities, but there’s ways that things can ask about businesses right. the reality is the only things that we foresee asking about businesses right now are other computers. So we worry about the transactional response time of that question and answer and the anthology of the question and the anthology of the answer and all that’s great, but now do we do anything to detect what type of thing might be on the other end of the question.

(37:55) And you know without getting into any security, there are things that we do today to make sure that the thing that we’re talking to is something that we intend to be talking to. We’ve got to do a lot mot ideation to make sure that what we believe remains true as things get smarter and talk faster and find new ways to whisper in our ear and all of that, just like any other company that touches the internet, you just can’t say, well we we’re safe yesterday so we’ll probably be safe tomorrow. That’s crazy.

Michael Krigsman:

(38:25) What advice as we go towards the close here, what advice do you have for business people to use data science effectively?

Anthony Scriffignano:

(38:37) So maybe I can tell you a quick story about something that happened in my experience here that literally has changed my life. Number of years ago we had this horrible situation in Japan where there was an earthquake that caused an tsunami, the tsunami hit the coast of Sendia, 20,000 people were washed out to sea. You had a nuclear meltdown at the Daiichi power plant. You had all these things happening to Japan all at once. Absolutely horrific. Unprecedented.

(39:04) No data science in the world ever foresaw anything like that happening right. and here we are, we come together. I was on a conference call. A few days later I was in Japan right before that happened, and we said look, we can do things from a humanitarian standpoint, but also from a business standpoint. There’s got to be something we can do to help these mostly small businesses in Japan that are kind of living hand to mouth and now everybody assumes that are now out of business.

(39:28) Many of them are still in business. Many of them are still there and doing fine and if everybody assumes they’re not then things are going to get even worse on top of radiation and tidal waves, they’re going to have to deal with no money.

(39:40) So we started to look at a database that said everything was just the way it was just before this thing happened. To fix it the old fashioned way was going to take a very long time. Way longer than these people have. And so we had to look at new ways of collecting information. We started to look at new types of data that were available. We looked at crowd source radiation data. We taught algorithms how to find the skyline and to measure the change in the skyline before and after. We looked at uninterrupted straight and curved lines that became interrupted and geospatial imaging.

(40:11) We looked at the propagation from the tectonic wave from the epicenter of the earthquake and we built 19 different car detectors to measure whether or not cars were there and what they looked like. You could argue that some of that capability already existed but we didn’t have time to go find it.

(40:27) Very quickly we put all of this together and we built it heuristic like I described to you before and we taught it how to look at the data and we fixed all of the data in japan in about three months, and it would have us take well over a couple of years the old fashioned way for very good reasons.

(40:42) We then had this dataset that was probably the most valuable dataset that you could own at that point relative to Japan, and we could have made a lot of money on that. And what we did we put it on the internet and gave it away for free, and every time I tell that story I get tears in my eyes. So my very long winded answer to your question is we’ve got to be better than just making another dollar.

(41:03) We’ve got to think about the unintended impact of doing nothing. We’ve got to think about letting the bad guys get ahead of the good guys. We’ve got to think about what we’re teaching our kids. We’ve got to think about what we’re teaching ourselves or we’re just going to drown in this data and lose unbelievable opportunity and just find ourselves swimming gin stupid decisions because we didn’t have the time to do anything better. We’re much better than that, and I truly believe that if we bring science into the room we can at least make new mistakes everyday which is a very good start.

Michael Krigsman:

(41:36) And what words of advice do you have to business people that are dealing with the data and they’re finding that the data is pointing out viewpoints on the world that are differently from their previously held beliefs and we know change is hard, so what advice do you have there.

Anthony Scriffignano:

(42:02) So I would say three things. First of all just knowing that you have that problem is the first step, so being what’s called the reflective leader. Thinking about what you believe in and why you believe it is extremely important. Your example of the guy with the pink planets before, screaming that he’s got lots of experience that’s great. But we’ve got to be better than that. so the first thing is to be very clear about what we believe and why we believe it.

(42:30) The second step is once we understand that and presumably we can ask better questions about the business and what we’re trying to prove and all of that. The second step is to look at the skills that we’re bringing into the organization and make sure that we’re not just brining in people that have rebranded themselves in this data science space, but people that really understand the different ways of knowing, the different ways of discovery, the different issues with regulation and with synthesis of information; brining in the skills that we need.

(43:01) And the last thing and probably the most important thing is constantly looking inwardly at ourselves and making sure that the skills that made us successful so far, those are just table stakes. We’ve got to be constantly improving. This is a whole new world out here and we’ve got to have the conversations that’s tough, but you know you’re not as good as you think you were, because you’ve got to be much better tomorrow than to just stay where you were.

Michael Krigsman:

(43:25) Anthony Scriffignano, Chief data Scientist at Dun and Bradstreet, what can I possibly ask you beyond your last comment. Thank you so much for taking the time today, it’s been enlightening

Anthony Scriffignano:

(43:40) It’s been a delightful conversation, thank you so much for the opportunity.

Michael Krigsman:

(43:45) WE have been talking with Anthony Scriffignano, who is the Chief Data Scientist at Dun and Bradstreet. What an amazing conversation and I would like to thank Anthony and thank the folks at Dun and Bradstreet for making this possible. And especially to everybody who is watching thank you and come back next time because we’ll be here next Friday as always.

Companies mentioned on today’s show:

Dun & Bradstreet        www.dnb.com

Facebook                     www.facebook.com

LinkedIn:                      www.linkedin.com

 

Anthony Scriffignano:

LinkedIn:          www.linkedin.com/in/anthony-scriffignano-ph-d-9165845

Twitter:           https://twitter.com/scriffignano1

Published Date: Feb 12, 2016

Author: Michael Krigsman

Episode ID: 315