Data Science at Zillow, with Stan Humphries, Chief Analytics Officer

Stan Humphries

Chief Analytics Officer

Zillow Group
Michael Krigsman

Publisher

CXOTalk

Zillow is one of the largest real estate and rental marketplaces in the world, with a database of 100 million homes in the US. The company pioneered data-driven, automated home value estimates with the Zestimate score. On this episode, we speak with Zillow's Chief Analytics Officer and Chief Economist, Dr. Stan Humphries, to learn how Zillow uses data science and big data to make a variety of real estate predictions.

46:15

Dr. Stan Humphries is the chief analytics officer of Zillow Group, a portfolio of the largest and most vibrant real estate and home-related brands on Web and mobile. Stan is the co-author of the New York Times Best Seller “Zillow Talk: The New Rules of Real Estate.”

As chief analytics officer, Stan oversees Zillow Group financial planning and analysis, corporate strategy, economic research, data science and engineering, marketing and business analytics, and pricing analytics. Stan was one of Zillow’s earliest pre-launch employees and is the creator of the Zestimate and its first algorithm.

Stan also serves as chief economist for Zillow Group. He has built out the industry-leading economics and analytics team at Zillow, a recognized voice of impartial, data-driven economic analysis on the U.S. housing market. Stan is a member of Fannie Mae’s Affordable Housing Advisory Council and the Commerce Department’s Data Advisory Council. Stan also serves on the Visiting Committee of the Department of Economics at the University of Washington.

Transcript

Michael Krigsman: Welcome to Episode #234 of CxOTalk. I'm Michael Krigsman, and CxOTalk brings to you truly the most innovative people in the world, talking about topics relating to digital disruption, and machine learning, and all kinds of good stuff. Before we begin, I want to say a hearty "Thank you" to our live streaming video platform, which is Livestream. Those guys are great! And if you go to Livestream.com/CxOTalk, they will even give you a discount.

So, today, we are speaking with somebody who is a pioneer in the use of data and analytics in consumer real estate. And we're speaking with Stan Humphries, who is the Chief Analytics Officer, and also the Chief Economist of the Zillow Group. And I think that everybody knows the Zillow Group as Zillow and the Z-Estimate. Stan Humphries, how are you?

Stan Humphries: Hey, Michael! How are you doing? It’s good to be with you today!

Michael Krigsman: I am great! So, Stan, thanks for taking some time with us, and please, tell us about the Zillow Group and what does a Chief Analytics Officer and Chief Economist do?

Stan Humphries: Yeah! You bet! So, I’ve been with Zillow since the very beginning back in 2005, when what became Zillow was just a glimmer in our eye. Back then, I worked a lot on just algorithms, and some part development pieces; kind of a lot of the data pieces within the organization. We launched Zillow in February of 2006, and back then, I think people familiar with Zillow now may not remember that between our first couple of years between 2006 and 2008, all you could find on Zillow was really all the public record information about homes and displayed on a map. And then, a Zestimate, which is an estimated the home value of every single home, and then a bunch of housing indices to help people understand what was happening to prices in their local markets. But, we really grew the portfolio of offerings to help consumers from there and added in ultimately For Sale listings, mortgage listings, a mortgage marketplace, a home improvement marketplace, and then, along the way, also brought in other brands. So now, Zillow Group includes not only Zillow brand itself, Zillow.com but also Trulia, as well as StreetEasy in New York, Naked Apartments, which is a rental website in New York, HotPads, and a few other brands as well. So it’s really kind of grown over the years and last month, all those brands combined got about 171 million unique users to them online. So, it’s been a lot of fun kind of seeing it evolve over the years.

Michael Krigsman: So, Stan, […] you started with the Zestimate. You started aggregating data together, and then you came up with the Zestimate. What was the genesis of that Zestimate and maybe you can explain what that is?

Stan Humphries: Yeah. Sure! So, we were, in the early days, we were looking at different concepts that seemed like there was a lot of interest in from consumers about real estate and, I think there was a lot of angst about really what we think as an economist, we think of as information asymmetry. So, the fact that certain participants in the marketplace of real estate have a lot of information, and other people don't have any information. And, we felt, I think, in a lot of the leadership team that found that Zillow … You know, our reference point was, you know, we were very passionate about more of this social progress of transparency in various marketplaces, which you had seen in the 80s and 90s in stock markets. But we had been part of, actually prior to Zillow at Expedia, about eliminating information asymmetries in the travel agency space. You had seen it in insurance and a lot of different sectors. We were very interested in kind of creating information transparency in the real estate sector, so that got us very interested in where was the information people wanted, and how could we get it; and how could we make it available for free to consumers?

And once we had done that, a lot of that information is squirreled away and county tax assessors and county recorder offices around the country … And how our country's organized is those tend to be more than 3,000 different counties around the country, and each office has a different format of the file, and it became our job to go to all those different places and get all that data and put it online in a standardized way.

And, you asked about the Zestimate. The way that came about was once we had done that and we bring people in in the early days, and we’d show them a UI of what we were trying to do. We showed them these maps of recently sold homes, and then you could click on any house and see the public facts, and when it was last sold. We noticed that people had what we thought was really a really focused interest on recently-sold homes, and they would jot them down on napkins when we brought them into the offices to look at the user interface for focus groups. And we were like, “What are you doing there?” It became clear that they were very interested in looking at recently sold homes in order to understand the value of a home they might be looking to either buy or sell in the future. And that was kind of an a-ha moment where we were like, "Wow! Okay, if you're trying to figure out an estimated price for a home, then maybe we can help you do that better than just napkin math." So that was the genesis of the Zestimate and today, we do a whole lot more than napkin math. It is a very substantially computationally and processed reassessment.

Michael Krigsman: How has the Zestimate changed since you began it?

Stan Humphries: Yeah. So, back in, if you look at when we first rolled out in 2006, the Zestimate was a valuation that we placed on every single home that we had in our database at that time, which was 43 million homes. And, in order to create that valuation in 43 million homes, it ran about once a month and we pushed a couple terabytes of data through about 34 thousand statistical models, which we thought was, and was, compared to what had been done previously, was an enormously more computationally sophisticated process. But if you flash forward to today; well actually I should just give you a context of what our accuracy was back then. Back in 2006 when we launched, we were at about 14% median absolute percent error on 43 million homes. So what we've done since, is we've gone from 43 million homes to 110 million homes today where we put valuations on all 110 million homes. And, we've driven our accuracy down to about 5% today which, we think, from a machine learning perspective, is actually quite impressive because those 43 million homes that we started with in 2006 tended to be in the largest metropolitan areas where there was a lot of transactional velocity. There were a lot of sales and price signals with which to train the models.

What's in the rest of, as we went from 43 million to 110, you're now getting out into places like Idaho and Arkansas where there are just fewer sales to look at. And, it would have been impressive if we had kept our error rate at 14% while getting out to places that are harder to estimate. But, not only did we more than double our coverage from 43 to 110 million homes but we also almost tripled our accuracy rate from 14% down to 5%.

Now, the hidden story of how we’re able to achieve that was basically by throwing enormously more data, collecting more data, and getting a lot more sophisticated algorithmically in what we are doing, which requires us to use more computers. Just to give a context, I said that back when we launched, we built 34 thousand statistical models every single month. Today, we update the Zestimate every single night and in order to do that, we generate somewhere between 7 and 11 million statistical models every single night, and then when we’re done with that process, we throw them away, and we repeat again the next night. So, it’s a big data problem.

Michael Krigsman: How did your, shall we say, algorithmic thinking, change and become more sophisticated from the time you began … What was the evolution of that? That must be very interesting.

Stan Humphries: Yeah. It certainly has been. There have been, you know, there have been a few major changes to the algorithm. We launched in 2006. We did a major change to the algorithm in 2008. Another major change in 2011, and we are now rolling out another major change right now. It started in December and we'll be fully deployed with that new algorithm in June. Now that's not to say every single day in between those major releases; we're doing work and changing bits and pieces of the framework. Those times I described there is kind of major changes to the overall modeling approach. And what has changed is, as probably suggested by the fact of how many statistical and machine learning models are being generated right now in the process, what has changed a lot is the granularity with which these models are being run; meaning, a lot finer geographic granularity and, also, the number of models that are being generated. So right now, when we launched, we were generally looking at a county and in some cases for very sparse data, maybe a state, in order to generate a model. And, there were, like I said, 34 thousand of those different models.

Today, we are generally looking at … We never go above a county level for the modeling system, and large counties, with a lot of transactions, we break that down into smaller regions within the county where the algorithms try to find homogenous sets of homes in the sub-county level in order to train a modeling framework. And that modeling framework itself contains an enormous amount of models, where there are models … Basically, the framework incorporates a bunch of different ways to thin, about values of homes combined with statistical classifiers. So maybe it’s a decision tree, thinking about it from what you may call a “hedonic” or housing characteristics approach, or maybe it’s a support vector machine looking at prior sale prices.

The combination of the valuation approach and the classifier together create a model, and there are a bunch of these models generated at that sub-county geography. And then there are a bunch of models which become meta-models, which their job is to put together these sub-models into a final consensus opinion, which is the Zestimate.

Michael Krigsman: This is very interesting and I want to remind people that we’re talking with Stan Humphries, who is the Chief Analytics Officer and also the Chief Economist at the Zillow Group. And I think most people probably know the Zestimate that automatically estimates a value for any piece of real estate.

Stan, so you’ve been talking about your use of data and the development of these models. But, real estate has been a data-intensive business, right? The analyst shares real estate data, but it’s static data. And so, again, what were you doing, and how did this change the nature of the real estate market? So if you can go from the technology into the disruptive business dimension?

Stan Humphries: Sure. Yeah. But you know, I think you’re right Michael, in the sense that there’s always been a lot of data floating around real estate. I would say, though, that a lot of that data had been largely impacted, and so it kind of had a lot of unrealized potential. And that’s a space that, as a data person, you love to find. And, honestly, travel, which a lot of us were in before was a similar space, which is dripping with data, and a lot of people had not done very much with that data, and it just meant that really a day wouldn’t go by where you wouldn’t come up with “Holy crap! Let’s do this with the data!” And, you know, real estate was one where we certainly had multiple listing services had arisen … But the very purpose of facilitating the exchange of real estate between unrelated brokers and, which was a very important purpose, but it was a system… There were multiple listing services which were between different agents and brokers on the real estate side. There were homes that were for sale. You had, though, a public record system which was completely independent of that, and actually two public records systems: one about deeds and liens on real property, and then another which was tax roll.

And, all of that was kind of disparate information and … What we were trying to solve was the fact that all of this was offline, and we really just had the sense that it was like, from a consumer’s perspective, like the Wizard of Oz, where it was all behind this curtain, and you couldn’t really look…You weren’t allowed behind the curtain and you really just wanted to know, “Well, I’d really like to see all the sales myself and figure out what’s going on.” And, you’d like the website to show you both the core sale listings and the core rent listings. But of course, the people who were selling you the homes didn’t want you see the rentals alongside them because maybe they would like you to buy a home not rent a home. And we’re like, “We should put everything together, everything in line,” and we had a faith that type of transparency was going to benefit the consumer and I think it has where …

You know, what's been interesting in the solution is that you still find the agency representation as very important, and I think the reason that's been true is that it's a very expensive transaction. It will be generally for most Americans, the most expensive transaction and the most expensive financial asset they will ever own. And so, there has been and continues to be, a reliance, I think, a reasonable reliance on an agent to help hold their hand for a consumer as they either buy or sell real estate. But what has changed is that now consumers have access to the same information that the representation has either on the buy or sell side. And I think that has really enriched the dialogue and facilitated the agents and brokers who are helping the people, where now a consumer comes to the agent with a lot more awareness and knowledge, and is a smarter consumer, and is really working with the agent as a partner where they've got a lot of data and the agent has a lot of insight and experience; and together, we think they make better decisions than they did before.

Michael Krigsman: I want to tell everybody that there’s a problem with Twitter at the moment, and so if you’re trying to tweet about the show and your tweet is not going through, try doing it a second time and sometimes that seems to be making it work.

Stan Humphries: I am so glad to hear that you said it, Michael, because I just tried to retweet right before I got on and I couldn’t do it and I thought it was my Twitter app. Sounds like it’s Twitter overall.

Michael Krigsman: Yes, it seems like we’re back to the days of Twitter having some technical issues. Anyway, Stan, in a way, by the act of trying to increase this transparency across the broad real estate market, you need to be a, shall we say, a neutral observer. And so, how do you ensure that in your models, you’re as free from bias as you can be? And maybe would you also mind explaining the issue of bias a little bit just briefly? I mean, we could spend an hour on this, but briefly. So, what is the bias issue in machine learning that you have to face, and how do you address it in your situation?

Stan Humphries: Okay. Yeah. May I ask you for a few more sentences on the bias issue and machine learning? Because as a data person, I’m thinking about it from a statistical sense, but I guess that’s probably not how you mean it. In terms of the business model itself, and how we think and how that interaction with machine learning and what we’re trying to do, we are … Our North Start for all of our brands is the consumer, you know, full-stop. So, we want to surprise and delight and best service our consumers, because we think that by doing that, that, then…

You know, advertising dollars follow consumers, is our belief. And we want to help consumers the best we can. And, what we're trying to construct and have constructed is, in an economic language, is a two-sided marketplace where we've got consumers coming in who want to access inventory and get in touch with professionals. And then on the other side of that marketplace, we've got professionals, be it real estate brokers or agents, mortgage lenders, or home improvers, who want to help those consumers do things. And what we're trying to do is provide a marketplace where consumers can find inventory and can find professionals to help them get things done. So, from the perspective of a market-maker versus a market-participant, you want to be completely neutral and unbiased in that, where you're not trying to … All you're trying to do is get a consumer the right professional and vice-versa, and that's very important to us.

And that means that when it comes to machine learning applications, for example, the valuations that we do, our intent is to try to come up with the best estimate for what a home is going to sell for; which is, again, thinking back from an economic perspective, it's different than the asking price of the offer price. In a commodities context, you call that a bid-ask spread between what someone’s going to ask in a bid; and the real-estate context, we call that the offer price and the asking price. And so, what someone’s going to offer to sell you their house for is different than when a buyer’s going to come in and say, “Hey, would you take this for it?” There’s always a gap between that.

What we’re trying to do with Zestimate is to better inform some pricing decisions such that bid-ask spread is smaller, such that we don’t have buyers who end up buying a home and getting taken advantage of when the home was worth a lot less. And, we don’t have sellers who end up selling a house for a lot less than they could have got because they just don’t know. So, we think that having great, competent representation of both sides is one way to mitigate that, and one way that we think is fantastic. Having more information about pricing decision to help you understand what that offer to … offer-ask ratio, what the offer ask-spread looks like is very important as well.

Michael Krigsman: So, from a data collection standpoint, and then a data analysis standpoint, how do you make sure that you are collecting the right data and then analyzing it in the right way so that you’re not influenced […] wrongly or over-influenced in one direction, or under-influenced in another direction, which would, of course, lead to distortions in the price estimates.

Stan Humphries: Yeah. Let's see, I'm trying to think of vices that we watch for in the evaluation process. I mean, one obvious one is that the valuation that we're trying to produce is a valuation of an arms-linked bear market exchange of a home which, those words are important because it means that there are a lot of transactions which are not full value at arms-length. So, if you look in the public record and you start to build models off the public record, you've got a lot of homes that are a lot of deeds that are […] claimed due to the works. And you know, they are ten dollar exchanges of real property, which is not a fair value. And, you have some that are arms-length, where parents are selling homes to their children for pennies on the dollar, and those aren't fair value either. And then, of course, the most common example from the past housing cycle is a foreclosure or short-sale, where, you know, we're not trying to… We do provide a foreclosure estimate, for foreclosures, but the Zestimate itself is designed to tell you what that home would transact for as a non-distressed piece of inventory in the open market; which means that we've got to be really diligent about identifying foreclosure transactions and filtering those out so that the model is not downwardly biased and becomes really a [...] between a non-distressed and distressed property. So that's one area that we kind of have to watch for quite a bit.

Michael Krigsman: And we have a question from Twitter. I’m glad this one went through. I’m having trouble getting my tweets out there. And, this is an interesting one from Fred McKlymans, who asks; he’s wondering how much the Zestimate use-case – how much as the Zestimate helped define, rather than just reflect real estate value? So, what impact has Zillow itself had on the market that you’re looking at?

Stan Humphries: Yeah. That's a question we get a lot, and particularly as, you know, as our traffic has grown is people want to know, "Do you reflect the marketplace? Do you drive in the marketplace?" And my answer to that is that on any given… Our models are trained such that half of the Earth will be positive and half will be negative; meaning that on any given day, half of [all] homes are going to transact above the Zestimate value and half are going to transact below. [...] I think [this] reflects the fact of what we said since launching this Zestimate, which is we want this to be a starting point for a conversation about home values. It's not an ending point.

You know, there was a reason why the name “Zestimate” came from the internal working name of a Zillow Estimate. We got tired of calling it a Zillow Estimate so we started to call it a Zestimate. And then when it came time to ship the product, we're like, "Why don't we just call it that?" You know, but it was called the Zillow Estimate, Zestimate, not the price because it is an estimate. And, it's meant to be a starting point for a conversation about value. And that conversation, ultimately, needs to involve other pay means of value, include real estate professionals like an agent or broker, or an appraiser; people who have expert insight into local areas and have actually seen the inside of a home and compare that inside and the home itself to other comparable homes.

So, you know, that’s kind of designed to be a starting point, and I think the fact that half of homes sell above the Zestimate and half below, I think reflects the fact that people are … I think that’s an influential data point and hopefully, it’s useful to people. But it’s not the only data point people are using, because another way to think about that stat I just gave you is that on any given day, half of sellers sell their homes for less than the Zestimate, and half of buyers buy a home for more than the Zestimate. So, clearly, they’re looking at something other than the Zestimate, although hopefully, it’s been helpful to them at some point in that process.

Michael Krigsman: Mhmm. And, we have another question from Twitter. And again, I’m glad that this one went through; it’s an interesting question: “Have you thought about taking data such as AirBnB data, for example, to reflect or to talk about the earning potential of a house?”

Stan Humphries: That is an interesting … I'm noodling on that. We've done some partnerships with Airbnb on economic research, kind of understanding the impact of Airbnb by housing data that we have. We do a lot of work on that. I think probably the direct answer to that using AirBnB data is "no," but when you say the earning potential, I guess what I'm hearing is the potential to buy that home and convert into a cashflow-positive rental property, and thinks like what's the cap rate, or the capitalization rate of the price to rent ratio. And, that we do a lot of, because we also have the largest rental marketplace in the US as well. So, we have a ton of rental listings, and then we use those rental listings for a variety of purposes, among them being to help understand price to rent ratios and what we compute as … Call it a "break-even horizon," which is how long you have to live in a house to make buying it more worthwhile than renting it.

So, we … And I guess the other thing would directly help that question would be the fact that on any home page, on a page that lists a home, we call them internally a home details page. On any home page on Zillow, we show both the Zestimates, so what we think that home would sell for, and we also show a rent Zestimate, what we think it would rent for. And, that hopefully allows the homeowner to have some notion for if they decided to rent it out, what they could get for it.

Now, the question that I think from Twitter is an interesting new one, which is; our rental estimate is on the rent of that entire home. What if you just want to rent out a room or part of that home? What's your potential on that? And that is a very interesting question, which we thought some about. We don't have a product to directly … Cool product there that seems directly related to the question would be, an estimate on Zillow that would tell you if you did want to rent out a room or two on that house, what would you fetch? And, that's a very interesting […]. Duly noted!

Michael Krigsman: Let’s go back to the discussion of machine learning. Machine learning has become one of the great buzzwords of our time. But, you’ve been working with enormous, enormous datasets for many years now. And, when did you start? Did you start using machine learning right from the start? Have your … We spoke a little bit about this earlier, but how have your techniques become more sophisticated over time?

Stan Humphries: Yeah. I would say I’ve been involved in machine learning for a while, from I guess I started in academia when I was a researcher at a university setting, and then at Expedia, I was very heavily involved in machine learning, and then here. So, you know, there has been … Biggest change … Well, it’s hard to parse it. I was going to say the biggest change has really been in the tech stack over that period of time, but, I should minimize the change in the actual algorithms themselves over those years, where algorithmically, you see the evolution from at Expedia, personalization, we worked more on things relatively sophisticated, but more statistical and parametric models for doing recommendations; things like unconditional probability, item-to-item correlations. And, now, most of your recommender systems, they’re using things like collaborative filtering for algorithms that are optimized more for high-volume data and streaming data.

And in a predictive context, we’ve moved from things like decision trees and support vector machines to now a forest of trees; all those simpler trees with much larger numbers of them… And then, more exotic […] decision trees that have in their leaf nodes more direction components which are very helpful in some contexts.

In terms of the tech stack, you know, it’s been transformed. You know, back in the day, you were doing stuff with seed code; we were using … Maybe we were doing prototyping in ADS-plus. You were usually coding in FORTRAN or C, but you were doing it all from scratch […] on a single machine and trying to get as much as you can into memory. And, you know, today, from that, it has gone through to more proprietary systems; maybe you were using SaaS scale, to then you were maybe using a database, maybe in my sequel, you were using Hadoop. And then today, generally, our firm and other firms that are on the cutting edge here are using something like Spark, probably. Maybe, in coding directly in Scala, or maybe using Spark to plug into Python or R.

And then, generally, those frameworks are now running into Cloud, and are using streaming systems like Nexus or Kafka; real-time triggering of events. And so, all the infrastructure has changed, and I would say for the better. As a data scientist now, you can get up and start working on a problem on, you know, AWS, in the Cloud, and have an assortment of models to quickly deploy much easier than you could back twenty years ago when you were having to code a bunch of stuff; start out in MATLAB and import it to C, and you were doing it all by hand.

Michael Krigsman: Are you looking at the … Are you making predictions about the future value of the home, or only from the past to the present moment?

Stan Humphries: The Zestimate itself is, you know, as a, I guess some people would call it a “now-path,” so it’s a prediction for what the home will sell for today if it were on the market. We do also forecast the Zestimate forward in time. Right now, we project forward about a year. And, that model is a combination of the machine learning models I described before. The point-estimate of what we’re estimating today; and then moves it forward. It’s combining it with the modeling framework - a forecasting framework [with] which developed the purposes of forecasting our housing index – of the Zillow Home Value Index, which tells you basically the home values have done over the past twenty years, and what they will do with the next year.

That forecasting framework is itself a combination of some straightforward, univariate areas, and some more complex structural models that are taking as inputs economic variables, and trying to predict what those economic variables are going to do to home prices in your local market over the next year. We take those forecasts from the index and then apply them to the individual level, with some nuances where it's not just the forecast for your area. We then break that forecast down by housing segments so that maybe, high-end homes are going to appreciate more quickly than low-end homes, or vice-versa. That nuance effects the forecast that is then applied to the property level to create the forecast for the Zestimate.

Michael Krigsman: I want to remind everybody that we’re talking with Stan Humphries, who is the Chief Analytics Officer, and Chief Economist at Zillow. And, if you’re trying to ask a question by Twitter, just keep trying and some of those tweets are actually getting through.

Stan, what about the data privacy aspects of all of that? […] I know you’re aggregating public data, but still, you’re making public information about the value of people’s homes and there’s a privacy aspect to this. So, how do you think about that?

Stan Humphries: Yeah. That’s a, you know… We’ve been fortunate in most of our business operations… Really almost all of our business operations involve the domains that are all a matter of public record. And, a lot of the value-added that we’ve done is to bring that public record data in collating it together into one spot. And, putting it, standardizing it so there’s kind of a standard way to look at it regardless of how they collect data in Idaho versus Florida. They’ll standardize it so that on Zillow, you’re looking at it. Truly our […] easier […] you’re looking at it all the same way.

But, at its core, that's all public record information, which is beneficial when it comes to privacy because all of that data is, at this point, generally accessible. It's all available if you were to walk into a county tax assessor or county recorder's office. And at this point, most of those offices are now online. So, if you knew where to look on the web, then you could find that information online because it is a matter of public record, because of the fact that real estate is based on property taxation. And, it is a longstanding history for why things involving real property, liens, and actual information about real property is public-domain information. But, in all states, most of that information is public, except there are some states where the actual transaction price itself is not a matter of public record, and those are called "non-disclosure" states; states like Utah and Texas. Everything else is public record.

And what we're doing is then providing estimates and derivative data on top of that. So, we're creating housing indices out of that data, or evaluations. And those evaluations are, theoretically, no different than if you were to go to the county tax assessor's website or into their office, there is already a market value to access they're putting on your home, which is a matter of public record. Ours is, you know, we're applying probably a lot more […] hour and algorithmic specification that a lot of tax assessors are able to do. But, in principle, it's exactly the same concept as that.

Michael Krigsman: So how it feels different when that data is aggregated and then presented in such a succinct form. And it’s also easily accessible. Somehow, it feels different.

Stan Humphries: Yes. That's true. I would say it feels different. I would say it feels different in a lot of different ways. For most consumer applications, it feels different because it feels really good. When, for an individual, there are some individuals who would like that information to be…would like all the information! They would like no facts about their home to be public, so they would probably prefer the county assessor to make it public. They would prefer the transactions were not a matter of public record, and they would prefer if companies weren't able to put a derivative product on top of that. I certainly get that. They problem becomes a collective action problem where individually we would all prefer to take all of our information offline, but collectively we would like the ability to look at other information for us to make a better decision. And, collectively, as a society, we have decided that this information should be public. And, because of that, properties like … companies like Zillow are able to make that information public as well, which we think the consumer benefits far outweighs the individual concern that they would prefer the facts about their homes not be public.

You know, I've … There are also, I think, real social equity issues here of there's a lot of research. When you look at kind of disclosure; non-disclosure states, for example, you will find that taxation policy is … There's been some fantastic academic research on this issue, but property tax, there's more inequality in property tax in non-disclosure states than disclosure states because people are able to look at those transactions and figure out how does that tax relate to what that home's really worth? And therefore, disputes are less likely to refer to the lower half of the price spectrum, but wealthy people will always go dispute and try to get a lower assessment on their home. And that leads to more inequality in the assessment of tax than would exist otherwise, which we think is a harm to the overall public benefit.

Michael Krigsman: And you just raised some public policy issues. And so, in our last five minutes, I’ll ask you to put on your economist hat and share your thoughts on how this data economy; and in a way, you’re right in the middle of the data economy. How is … How do you see that changing the workforce and the public policy issues around that?

Stan Humphries: Yeah. I think, you know, we … You know, I do a lot of writing now on policy […] to real estate and housing. And also, some kind of more broader economic discussion. And that broader economic discussion, I do do a lot of … One of the themes I touch on somewhat often is the need for us to get ahead of the changes that are coming due to machine learning and the data era, where I think there are two parts of our social, of our societal framework that will really establish facts. Generally with the last transformation; well, not with the last transformation, probably the one prior to that; where basically, we moved from an agrarian society to a manufacturing society, it was around that time when we started mandatory, compulsory public education. We also started to set up, with the progressive era in the early 1900's, social security systems and unemployment systems that allowed for people who may be thrown out of work from a manufacturing job to have a little bit of a safety net, where they found their other job.

You know, I am concerned in the current … This is less real-estate related and more the impact machine learning is going to have full-bore on our economy, thinking about the impact of driverless cars, for example, on people who drive trucks and cars. You know, that's five to eight million people, and you know, they're going to come under pressure as self-driving car technology becomes more ubiquitous. And, I am concerned that one, we need to up our educational game, where we need to think about college education as being the equivalent in the late-1800's of high school education, and we need to be doing a better job of training in our college graduates for the jobs that exist. And, then I would say that on the unemployment side, that system I described is set up for a world where you lose a job, and your next job is likely to be in the same town you're in, and in the same field.

We're going to go through […] in the next thirty years, a lot of unemployment where the job you need to get is probably not in the area you live in and it's probably not in the field you're in, so it's going to require some re-tooling. And that's more that, like, six weeks to three months of unemployment. We need to think hard about people who are moving from a manufacturing job, and maybe their next job needs to be a computer-assisted machine operator, which is a non-trivial job that needs to be trained for. And you're not going to learn it in four weeks. So, I'm definitely interested in public policy trying to address those issues in a better way.

Michael Krigsman: And in the last two minutes, what advice do you have for public policy-makers on these topics? You mentioned education as being one thing. Any other thoughts on this?

Stan Humphries: Yeah. I would just encourage us to … We seem to be in particularly ideologically-charged times. You know, I would encourage us to think broadly, you know, like we did when we came up with compulsory public education for children and try to think, there are a lot of these ideas that if you think about [it], there are a lot of these ideas that have been suggested from both the left and the right. And, for example, a viable, possible replacement for, you know, short-term unemployment insurance with something more like a more robust negative income tax, which we have a form of that in this country called an "earned income tax credit," where you, for low-wage workers, we supplement their income to the tax system. You know, Milton Friedman was a champion of a very robust negative income tax on the right. We've got a lot of liberal thinkers who have championed it on the left. That type of system, where people can kind of step out of the day-to-day work, and be assured that they're going to make a base-level income for a longer period of time, and that income's going to allow them to get another job. Those ideas have come from the Left and the Right, and I would hope that we're going to be able to fashion a system that's going to work better for the next thirty next thirty years than we've got now; and that we don't get hung up on rigid ideology on it.

Michael Krigsman: Okay. And, I’m afraid that about wraps up our show. We have been speaking with Stan Humphries, who is the Chief Analytics Officer, and also Chief Economist of the Zillow Group. And Stan, thank you so much for taking your time and sharing your thoughts with us!

Stan Humphries: Michael, thanks for the interview! It’s a broad range of topics we got to cover. So, it’s quite unusual, but it’s been fun!

Michael Krigsman: Yeah. That’s great! Forty-five minutes is enough time to dive in.

Everybody, thank you so much for watching, and go to CxOTalk.com/Episodes, and be sure to subscribe on YouTube. And also, “like” us on Facebook. You should do that! “Like” us on Facebook as well.

Thanks so much, everybody! Have a great day! Bye-bye.

Published Date: May 19, 2017

Author: Michael Krigsman

Episode ID: 434

Data Science at Zillow, with Stan Humphries, Chief Analytics Officer

Transcript

Audio Podcast

Related Episodes

AI and the Digital Healthcare Revolution

Digital Transformation in the Insurance Industry, with UNIQA Insurance Group

Artificial Intelligence in Business, with Anthony Scriffignano (Dun and Bradstreet)

HCM and Digital Transformation, with Stuart Sackman, CTO and CIO, ADP

Digital Privacy with Michelle Dennedy, Chief Privacy Officer, Cisco

Customer-Centric Data, Machine Learning, and the Internet of Things, with Adam Bosworth and Gary Flake, Salesforce.com

Technology, Culture, and Digital Transformation with Paul Daugherty, Chief Technology Officer, Accenture