24th Annual Financial Markets Conference - Mapping the Financial Frontier: What Does the Next Decade Hold? - May 19–21, 2019
- Papers, Presentations, and Audio and Video Recordings
- Speaker biographies
Policy Session 2: How Do We Navigate between Under- and Overregulation of Data?
Advances in our ability to obtain and analyze data are facilitating better decision making in a variety of areas but also reducing individuals' privacy. What are the tradeoffs associated with alternative allocations of property rights over these data?
Douglas J. Elliott: Welcome. This is the last session of the afternoon, which makes us the warm-up act for Chairman Powell, who will be speaking at dinner. So it's a little bit like leading up to the Rolling Stones or something, and it's a big responsibility for us. But luckily we have a fascinating and challenging topic for our panel today, which is, how do we navigate between under- and overregulation of data?
I'm Doug Elliott, the moderator of the panel, and I'm a partner at the management consulting firm of Oliver Wyman. My fellow panelists are Alessandro Acquisti—and I'm going from my left, in that direction—professor of information technology and public policy at Carnegie Mellon University; Tara Sinclair, associate professor of economics at George Washington University; and Jim Stoker, head of financial crimes, data, and analytics at SunTrust. There are more detailed bios, as you know, of all of us on the conference app, so I'm not going to detail my panelists' many additional accomplishments and roles.
The way we're going to do this is a little bit different from the previous panels. I'll make some opening remarks to set the scene, and then Alessandro will describe his very useful paper that reviews economic thinking on data privacy issues—and actually, it's quite a nice complement to the previous panel. Tara and Jim will then present their own thoughts. They won't be direct reactions to the paper, but will come from some other angles. I'll then lead a short discussion among the panelists, and then we'll go to the Q and A period with all of you.
So, in terms of the opening remarks, given my time constraints, I'm just going to underline three key points. First: this issue will matter greatly for financial policymakers in the years ahead—that's why I've been focusing so much on it myself. So on the one hand, data analytics are reshaping the industry, as I think you all know. They're changing the competitive landscape, altering the products that will be offered, and changing the economics of the business for everyone. On the other hand, rising public concern about data privacy may substantially limit or channel the use of this data and these new tools.
So let me give you a concrete hypothetical example—and I have no idea whether this will happen or not, but I could imagine in five or 10 years that we might find the great bulk of credit underwriting is done using big data and machine learning, because these techniques really are advancing a great deal. I could also imagine in parallel that data privacy concerns have changed social mores enough that perhaps 20 percent of people might refuse to allow the detailed data to be used. Their application would simply keep to the more basic information that's used now.
Well, there's a real issue there because Europe's General Data Protection Regulation, and California's Consumer Privacy Act, which is modeled on that. Both say you can't discriminate against the people who chose to give you less data. Well, for those of you who have thought seriously about these data issues, there is no way to not discriminate using big data and machine learning. Now, I don't think that the Europeans are so doctrinaire they'll say, "Therefore you can't use big data and machine learning," but it just says there's a trade-off here. I think in this case the Europeans and the Californians will have to work out what is the actual discriminatory behavior that they're worried about, and how can they limit that while still allowing the social advantages of big data and machine learning.
The second point, in addition to the fact that this will matter: as Alessandro's paper demonstrates, the policy issues are really complex in this area—and you caught it a bit from the panel just before us. As I've wrestled with it myself, and I've been doing work for the World Economic Forum and separately for the IMF, among other things, it's become clear that framing matters immensely here. So let me give you one aspect of that, which is, there are at least seven different lenses through which one could view these data rights issues, and they can lead to quite different answers. So just think about what those are. One is human rights. the Europeans directly state that data privacy is a fundamental human right. If you come from that direction, you're going to go for a pretty strong version of those protections.
On the other hand, in many countries financial inclusion has been very significantly aided by the use of big data. so there's a financial inclusion lens. There's a lens that's related to that of customer service, that you can provide better, more innovative, and cheaper services if you can use the data and the new techniques. Related to that, there's business effectiveness—whatever it is you're providing, you should be able to do it cheaper and better if you can take full advantage of this data revolution.
There's a competition angle—the UK in particular, which is pushing so-called open banking. A big reason for that is they see a retail oligopoly in their country, and they've not been able to find any way to break it. And one of their hopes is that fintechs using data freely may be able to compete with the existing big banks that dominate. There are national competitiveness and trade issues, which is why you see countries like India looking at walling off their data. China already has, where personal data will become very difficult legally to move out of the country, except with certain exceptions.
And then there's intellectual property law, which one of the previous panels was alluding to, which clearly needs to be revised. So financial regulation and supervision has always involved trade-offs. We know that. Policy in general does. But this is seven-dimensional chess. That's not a game anyone can play. So what we need to do is find a way to find simpler frameworks that don't lose too much of the nuance of the true complexity, but boil it down so that policymakers have a chance of grasping it. So ideally—and I'm nowhere near having this—I would present you with some sort of framework, so that if you're a policymaker you could say, "Oh, yeah. Under that framework, I'd want to be here." And that has the "follow him" implications for what policy we choose. If anyone has that, by the way, I'd love it. so, let me know.
And finally, I believe that the financial sector, public and private coming together, needs to work together to agree on the principles behind any eventual standards, rules or legislation. And I could go into this in a lot more detail, but just in brief: financial data has many specific ways in which it differs from data use in other sectors—not entirely, but the combination of factors makes it quite different. So it's important to cooperate to ensure that the answer society arrives at will actually work for this sector. I would argue that this did not happen in Europe with the General Data Protection Regulation, and it's raised many practical issues for financial supervisors and executives.
So with those three points, I'll leave my remarks here. But for those who are interested, I've recently issued a primer on data rights aimed at financial policymakers, which is available through the conference website—and it's in these soothing blue tones, in case you would like... I have a few physical copies, if someone happens to want that.
So let me turn it over to Alessandro.
Alessandro Acquisti: Thank you, Doug, and good afternoon, everyone—and especially thank you to the Atlanta Fed for the privilege of having me here to speak with you, and for the opportunity of writing a piece which you can find online and on your tables, which is part literature review, part opinion piece, and part an excuse to share with you some very, very recent results we have on this topic. And this will be the same format I will follow in my presentation.
I will start from a number of claims that are sometimes heard in the public debate over privacy. You can find them in the public discourse, sometimes you can find them also in the writings of some economists. The privacy concerns do not have solid economic grounds, meaning consumers actually claim to care for privacy but in fact, in actions, they are happy to disclose personal information because they receive much value from the disclosure. And when it comes down to showing actual harm related to privacy, there is not much evidence of that economic harm existing. Free online services would not be possible without increasing collection of consumer data. In fact, sharing personal data is an economic win-win. For instance, targeted advertising benefits all different stakeholders—stakeholders in the advertising ecosystem: merchants, publishers, consumers, data industry.
And at the end of the day, the loss of privacy is simply the price we have to pay in order to benefit from machine learning, from analytics, from big data. Now, there is something that these four claims have in common, apart from being a little bit of straw man arguments, is that none of them is actually empirically or categorically proven correct—and I'm trying to use my language very carefully. I'm not saying that they are proven wrong. I'm saying that they actually, each of these items, actually are open questions in the economics debate over privacy. And I'll try to demonstrate this, and convince you of this in the rest of my presentation.
I will try to frame my remarks by suggesting that it is useful to distinguish two questions, which in simplified terms are referred to as the "how much?" question and the "how?" question. The "how much?" question is, to what degree should consumer privacy be protected? And the "how?" question is, how do we achieve that degree of protection? And to some extent we want to believe that economics can help us address both questions, in terms of telling us what is the proper balance, at the individual and societal level, between protecting and sharing data, and whether market forces alone already are bringing us to the balance, or whether in fact we need some form of regulatory intervention.
Now, I will start with the "how much?" question, and in order to delimit the discussion, I will point out that privacy has been defined in many, many different ways by different scholars. In fact, it's been connected to items as diverse as freedom, dignity, anonymity, seclusion, confidentiality, secrecy—most of the economics working on privacy focuses on the definitions which are sort of on the right bottom side of the slide—so issues related to confidentiality, obscurity, control over personal information.
I would also point out that, although in a "fast side" version of the debate, people may see regulation and self-regulation as just two opposite extremes; either we have one or the other. Reality is much more complex, because we are on a spectrum where different forces—market forces, privacy-enhancing technologies, which in the previous wonderful panel started being discussed, concepts such as differential privacy, et cetera, consumer choice, nudges, soft paternalistic approaches to consumer decision making, and policy interventions—all these different factors play a role on the spectrum, and depending on the rate and the strength of these factors, we may be close to one extreme or the other.
This is just to set the board so that I can now use a five-minute overview and summary of a pretty wide and exciting field, which is the economics of privacy. Although the economists' interest in privacy has certainly exploded in the last 10, maybe 15 years, in reality the field actually is much older than that. In fact, it has a pretty remarkable pedigree, because it started in the late 1970s and early 1980s, with Chicago School scholars such as Posner and Stigler, giants of the law and economics literature, who were the first to use economics to try to understand privacy through laws.
And then somehow, there was a silence of about 10, 15 years. Economists didn't seem to be particularly interested to get into the field again. And in the mid-'90s, probably due to the IT explosion, economists started getting interested in it again. And then of course, after 2000, that's when the field really started picking up in a great manner.
So in the early '80s, with Posner and Stigler, the view of privacy was by and large a binary view. So privacy is hiding our information, which stands in contrast to much more diverse notions of privacy, which can see privacy as control over personal information. It may sound like a subtle difference. It's actually quite crucial, because if you see privacy as blockage of information, any sharing becomes almost a violation of privacy—or at least is contradictory to privacy. If you see it as control, the act of sharing is a manifestation of that control—very important difference, as we will see also for economic reasons.
Suppose I would see privacy pretty much as hiding consumer information. in fact, I also believe that, typically, individuals want to hide negative information—so negative information meaning information that actually is not good about themselves. For instance, if they are poor employees, they don't want their future potential employer to have this information, and they want to hide, through the shield of privacy, in order to avoid the employer getting this information about them and perhaps not offering a job.
On the other hand, individuals with positive information have an interest in sharing that. So reducing information available to "buyers" in this market—for instance, through regulation—reduces economic efficiency because it forces employers to choose employees who are not a good match for the position, and therefore we create more inefficiency in the market. In fact, the costs of privacy ultimately are borne by other parties. So privacy creates these negative externalities. We started encountering the concept of externalities in Christopher's presentation area.
Interestingly, and in total Chicago-school style, Posner also extended the economic argument to noneconomic fields, such as marriage. Before the marriage, you want to show all your good traits and you hide your negative traits. Once you are married, then you can show your true self, and by then it's too late for your partner to change their mind.
Stigler, also from the Chicago school, pushed this argument further by pointing out that the exchange of information will lead to desirable economic outcomes, independently of the ownership of data—meaning it doesn't matter whether there is privacy regulation or not, because again, if you have positive traits, you want to share them. You want other people to know them. So even if there is regulation protecting your privacy, you will volunteer that information. If you have negative information, regardless of whether there is protection or not, you will try to hide the information. Ergo, therefore, the absence of information implies negative information—so be suspicious if there is a lack of information. OK?
Now, this was absolutely seminal because these were the first scholars to think about these topics—a somewhat monolithic view of privacy, as individuals only protecting negative and only sharing positive.
However, in the mid-'90s, they started proposing a variation of this argument, and started pointing out the externalities, both positive and negative. It can arise due to secondary users using your information, meaning the consumer—the data subject—may rationally decide to reveal or to protect, but once she has taken these decision, there are other costs or benefits that may occur in which the customer—the data subject—no longer has any control. This complicates the economic calculus for the consumer.
The digitization of information creates other challenges of privacy, because they are documents which were already public before, but in fact due to the cost of collecting them or using them or accessing them were de facto semi-private. But due to the internet, this theoretically public data is factually public; it's so cheap to access it. This creates new challenges that we may address. How? Perhaps by defining property rights, a concept which was seeded early in the privacy economics field, and as we can see also from the last presentation is a concept that economists are still interested in, the idea of propertization of personal information.
And then after 2000—I was referring to that—there was a huge expansion as well as fragmentation of research in this area, meaning that scholars started going a little bit in all possible different directions. The models became sophisticated. It was not just broad economic reasoning, it was actually specific modeling. The focus started being diversified: on identity theft, tracking, price discrimination, et cetera. There was the emergence of peak analysis, as well as the emergence of the field of what I like to call "behavioral economics of privacy," trying to understand how consumers make decisions about what they share in data, and how and why these decisions sometimes appear paradoxical—the famous privacy paradox. We seem to care about privacy, yet we are very happy to share sensitive information. It's strange.
This is a table which—a figure, actually—that we use in a review on the economics of privacy which we published a couple, three years ago in the Journal of Economic Literature. It gives a sense of how diversified and segmented the literature nowadays is in this field. You can see all these different angles that are being studied by economists in this area.
I would like to go back to a key issue at the core of this debate—the issue of privacy economics and regulation—and whether, in essence, if we intervene through active regulatory efforts in the market, are we going to do good or are we going to do bad? So I go back to the "how much?" question, and then I will get also to the "how?" question.
If you take the early days of the economics of privacy, the lesson seems to be that any—or almost any—form or regulation would actually produce more harm than benefit, because privacy is about hiding negative information, as I was mentioning earlier. So privacy protections or regulation would interfere with market forces. Ultimately, it will create inefficiencies, as I was mentioning earlier, and specifically it would also create the redistribution of wealth from potential data holders to data subjects, such as the employer hires a poor employee—that benefits the employee, but not the employer.
So privacy is redistributive. Well, yeah, but it also happens to be the case that lack of privacy is redistributive, too. There is really no way out of this challenge. An example is provided by Hal Varian himself, whom I was quoting earlier. In a seminal piece on privacy, he pointed out that consumers rationally want some privacy for certain data, no privacy for other data. The case he was considering at that time was the case of telemarketing. This gives you a sense of the old technology he was referring to. Hal was writing really on the cusp of the internet revolution.
So the problem nowadays could be framed in terms of online targeted advertising, would be exactly the same problem—just different technology. Telemarketers call you in the evening, or used to call you in the evening, to offer you something. That's not a problem of lack of privacy; that's a problem, according to Hal, of too much privacy, because the telemarketer calls you with offers you're probably not interested in, so he is disturbing you over dinner for something that you don't want. But if the telemarketer knows you well enough, he we call you for things you're actually interested in, and the call will have value to you.
So the consumer actually wants the telemarketer to know some of her own interests in order to get offers which are valuable—the same argument nowadays the industry makes about targeted advertising. However, the second part of this story: the consumer actually does not want the telemarketer to know how much the consumer wants this product, because if the telemarketer knows exactly how much the consumer wants the product, what do we end up into? The theoretical case of first-degree price discrimination, perfect price discrimination. So the telemarketer charges a price for the services or product, which is as close as possible to the reservation price of the consumer—maximizing what he has to pay.
So all the surplus of the transaction goes to the telemarketer. Therefore, you can see here that where we draw the line between protection, or not, of data will impact allocation of surplus between the telemarketer and the consumer. That's why I'm saying there is no way out from the redistributive effect of either privacy regulation, or lack of privacy regulation. We have similar results in a paper I wrote with Hal, and a paper Curtis Taylor wrote just one year before we published ours. Again, under market conditions myopic customers will see their surplus being allocated away by profit-maximizing sellers. It's a pretty standard result.
Privacy is, overall... The broader market is very inefficient. Again we made that argument, discussing Posner and Stigler. But you know what? It can also be that data collection can be inefficient. Competition can push...kind of like one of the questions in the previous session was going into either direction of tragedy of the commons. We're kind of a tragedy of the commons situation here. There is this competitive pressure for firms to collect data. This eventually forces firms to overinvest in data collection. So we invest way more than optimally to collect data, because it's so cheap, so we keep investing into getting more and more, and the marginal benefit is not societally optimal.
Competitive pressure also leads to divergence between private and social marginal benefits—and, in fact, the absence of privacy protection can decrease not just consumer, but importantly, aggregate welfare. This is an interesting result. So it's not just an issue of, "Let's protect consumers." It is an issue of, "Let's maximize aggregate welfare." And sometimes the lack of protection decreases aggregate welfare. And there are some examples in the Journal paper—Journal as in Journal of Economic Literature.
These were theoretical results. What about the empirics? Data sharing and electronic medical records—you can find evidence that the adoption of electronic medical records has great, important, positive outcomes for many of us. Twenty-seven percent decline in patient safety events. You can find evidence that negative, economically speaking, outcomes increase outpatient charges by 12 percent.
How about innovation? You can find evidence that privacy regulation reduces technological adoption [in] a series of great papers by Amalia Miller and Catherine Tucker—specifically, getting on electronic medical records, HIEs (Health Information Exchanges), et cetera. Or you can find evidence, which we produced and published a few years ago in management science, that privacy regulation can increase technology adoption, because it reduces entrepreneurial uncertainty about what you can and cannot do with the data, and this unleashes their willingness to take risks and start HIEs, et cetera.
So I guess that you will start seeing a pattern here, and the pattern being that the results of protection—or absence of protection—are nuanced, are complex. There is no "one size fits all." And in fact, most likely, the key seems to be what type of regulation, what type of intervention. Earlier I showed you this slide, to make the point that—and I apologize if it's a very obvious point; I believe it is a very obvious point; I still make it to make sure we are on the same page—that regulation and self-regulation are not binary. It's not only either one or the other, but really we always start somewhere in the spectrum between these two.
So a few conclusions so far, before I get into the other, more recent parts of the talk. At the micro level, in terms of first-order consumer welfare effects, there are actually valid, rational arguments for privacy protection, including via regulatory intervention—meaning, yes, consumers may have a valid, rational desire for privacy as a purely self-interest. At the macro level, things get so much more nuanced, because now we get into the societal usage of data. By protecting data, are we making it harder for scholars to use genetic testing, et cetera, to find a cure for new diseases? Now we get into much trickier territory, and what we know from the literature is that, well, you know what? The facts of the societal aggregate effect of privacy protection could be positive, can be negative, can be indeterminate. Again, it really depends on context.
The facts are nuanced, context-dependent. They depend on the type of regulation. And because I felt bad about giving basically a non-answer to the question of "What is affected?" I will frame the non-answer. I will twist it around as an answer by saying, well, at the very least the non-answer provides a partial answer, which is, at the very least, we can say, we can dispel the myth that privacy regulation will always be depressing, will inherently be negative for economic growth, for innovation, or welfare. That's not the case. It can be, it can be. It can also not be. And this I find is an important conclusion.
Also, some words of caveats: inevitably, these studies do not consider certain critical factors that we should remember in the public debate over privacy. And I say inevitably, because the factors I'm going to list should not be intended as a critique of the studies because the studies rightly saw focus on provable statements, on robust results, and cannot overgeneralize. But in the public discourse we should keep in mind the following aspects: that there are multiple possible objective functions: Are we trying to stimulate growth? Are we trying to address consumer welfare? Are we trying to focus on innovation? Different objective functions may lead to different conclusions about what regulation, if any, we want. The interest of stakeholders, as I hinted at, may not be aligned—again, an item which came out in the previous panel, that there is really no reason to expect that the interest of all parties will be economically aligned.
There are so many interesting second-order, long-term effects of privacy protection or privacy invasion, which as economists, we don't even get close to touching because we would never get published trying to do those studies because it's very hard to causally link effects coming five years later to some legislation, or lack of legislation, happening now. There is a heterogeneous effect, not this model's focus on narrow views of privacy, such as, well, privacy in terms of how much I'm willing to pay for a good. Privacy of your valuation, of your preferences.
But in fact, privacy means so many different things within economic connotation, but they're different to put together into one single function that you try to maximize. And in fact, there is this role, a key role in privacy-enhancing technologies, which John in a previous panel discussion was proposing, that we should really not see privacy as a binary concept: protection or sharing, either/or. I go back to the point I made earlier: I promised you that the difference between "privacy is hiding information" versus "privacy is control" would become relevant. Here is why it's so relevant: with privacy-enhancing technologies, we can appreciate the subtlety of privacy being so, not binary, but being a spectrum. We can share some data, we can protect some of the data. and in doing so maybe we can achieve higher welfare.
There are the known economic dimensions of privacy. In the literature, privacy will lead to dignity, autonomy, freedom. And as economists we steer away from these concepts and from these questions, not because they are not important but perhaps because they are too important. and they are very difficult to measure in tangible means. That doesn't mean that they don't exist. Doug was referring to this notion of privacy as a human right, which is quite crucial in the European Union. So in the European Union, arguments over the economic value of privacy—so my field of research—would be important, but they wouldn't be the final point of privacy because there are all these other dimensions which we find much harder to quantify but they are nevertheless crucial.
Many of these studies assume that consumers are fully informed, and they are economically rational. By now, we have so much research proving that the world of data is afflicted by pervasive information asymmetries. My common joke is that the paradox of the data economy is how un-transparent it is for consumers, and also for researchers. It's very hard for us researchers to know exactly what happens to data flows of consumers. And also, consumers are affected by, afflicted by, enamored of cognitive behavior biases, when it comes to decision-making. That doesn't mean that they don't care, it means that it's very hard for personal preferences to be much revealed into the market—as commonly known, "revealed preferences."
Now, this was how much...
Elliott: You have two minutes, but since you referenced me, make it four [laughter].
Acquisti: So if I make another reference, it might go up to six [laughter]?
Elliott: It might. It might work.
Acquisti: So the "how"—because we essentially don't know what the actual perfect degree of protection is, because pretty much—I hope I've convinced you—there is no way to find the actual optimal level. How do we achieve an arbitrary level? So here I would like to spend my last four minutes discussing targeted advertising. Now we show—I will share with you two studies, which are unpublished. They've been partly peer reviewed, meaning at conferences, but not published yet in journals, because I feel they say something interesting about the role, the claims, being made about the value of data, and who really is benefiting from these data exchanges
Let's go back to targeted advertising, which is presented by the data industry as this economic win-win, where everyone is better off: merchants, publishers, consumers, online intermediaries. Now, is that really the case? How do we know exactly who is benefiting, and to what extent, from the collection of consumer data? And in fact, if we start regulating this space, what will we end up with?
You can make two different stories, and this is not to prove that in economics, depending on your framing, you can demonstrate very different things. The framing of the data industry is that there are consumers and publishers who want to buy—consumers want to buy goods, and obviously publishers, merchants, want to sell to consumers, and the data economy plays matchmaker between the two, reducing the search costs on both sides. Everyone is indeed better off. That's one reasonable story. There is a different story, I would say equally reasonable, which is, consumers want to buy products; they go to publishers' websites and they see ads; merchants try to use ads on sites to find consumers.
However, consumers—and there is the data introducer in the middle—consumers have finite budget and attention, meaning they cannot follow all ads and they cannot buy all products. Publishers compete aggressively with each other, because there are so many channels now for online publishing. Merchants compete aggressively for consumer attention. Data economy intermediaries are an oligopoly. Yes, the advertising ecosystem is vast, but really there are two or three players which are the dominant players.
So under this very simple I/O scenario, where do you think the surplus will go? In a platform with two markets, on the side where they compete with you aggressively on the sides, but the platforms are oligopolies. You would expect the surplus to go into the middle, so I will skip because I... Did I already gain a new added minute? I will skip the theory model because perhaps that will be less interesting, but suffice to say that we work on the theoretical model to understand this market better, and to understand which of these frames was more likely to be accurate. And we get instead into the empirical evidence, and I will close with this particular study.
So the empirical evidence, the one that we still have not published, is the following: We wanted to understand how much publishers—so, websites—are benefiting from behavior advertising. We know that behavior advertising is very valuable for the data intermediaries, and we know the merchants pay much more for behavior-targeted ads than non-behavior ads. but the question is, how much of that "much more" ends up going to the publishers. You can make two different stories, because under one story advertisers' willingness to pay increases when they can target precisely the audience, so the bids they make on ads increase, and the publishers' revenue increases.
The opposite story is that as you become better and better at targeting, your audience shrinks because you only want these particular initial consumers really interested your product. This will decrease bids, and it will decrease publishers' revenues. Or the two things can happen simultaneously, in that maybe, yes, advertisers bid higher and higher to target, and somehow very little arrive to the publishers. How can we know? We received data from an American media conglomerate, the owner of many websites, over 60 online websites, including very large-traffic websites and medium-traffic websites. We received the information, detailed information, about all ad transactions that have appeared over a week, including whether there were cookies available or not, and therefore whether the merchants who were advertising on the websites of this media conglomerate were behavior advertising their ads or not.
The transactions were over two million, and contain much very detailed information. And we started running analyses to see when there were cookies and therefore when merchants were able to behavior advertise, how much more the publishers were getting in revenues? It's a simple question. It's simple until you realize that there is a problem with endogenization, that the consumers who don't have cookies may not have cookies because they decided to remove the cookies. And maybe this trait—the propensity to remove cookies—is correlated with other traits of the consumer. Maybe those consumers are higher-value consumers, and maybe that influences the bidding.
So in order to correct for endogenization, we use a method which is called "augmented inverse probability weighting." I will skip the details of the methods, and I will get—I hope you will trust me that this method is meant to decrease concerns with endogenated results, and therefore our estimates are actually not biased, and these are the results: yes, publishers get more when they show ads which are behaviorally targeted rather than when they're not. But you know how much more that is? Four percent—or $0.00008 per ad.
Now, this difference is statistically significant. Is it economically significant? That's an interesting question, because to understand whether it's economically significant or not, we have to compare it to the cost of the infrastructure—the publishers have to do that—to the cost of user privacy, and most importantly—and here we go back to the question of the value allocation in the data economy—we have to go back to the fact that the merchants on the other side of the funnel are paying so much more for behaviorally targeted ads. This is a quote from last week from an article in the American Prospect, saying—I hope you can read the last sentence—"An online advertisement without the third-party cookie sells for just 2 percent of the cost of the same ad with the cookie." I'm asking you to do basically kind of like a double negative here. Basically what this is saying is that when advertising with cookies are sold, the cost for the merchant is so much higher than the ads without cookies.
So much higher, and yet only 4 percent increase in revenues for the publishers. This suggests that something, a lot is remaining in the middle. In turn, this suggests that when we hear claims about regulation of targeted advertising decreasing or even threatening the entire structure of the online advertising industry, we should be at the very least a little bit skeptical—or at least, we may want to scrutinize it further. I will stop here, because I went already long, and I thank you very much for your attention.
Elliott: Alright. thank you, Alessandro [applause]. Tara?
Tara Sinclair: So, thank you all for the opportunity to speak with you on this panel today. As Doug mentioned, my name is Tara Sinclair, and I'm a professor of economics at George Washington University in Washington, DC.. There, for over 10 years now I've run the research program on forecasting. In part of my activities there, I'm always looking around for new ways to improve forecasting models. So about six years ago now, I jumped on the big data bandwagon and I reached out to a job search website called Indeed. I started talking with them about what seemed to us at the time a very simple idea: we would take their immense amounts of data on the labor market—they started out, they were collecting 25 terabytes of data a day back then, and it's grown a lot since then.
We take all that data; we build some new economic indicators off of that data; we'd feed that data into forecasting models—for events, or at least predict the next recession, and improve millions and millions of lives. Simple ideas.
Elliott: It's worked so far!
Sinclair: Right. I mean...[laughs]. Admittedly, I can't claim credit for any of this so far. The project's been going slower than we initially dreamed. I think a lot of us, when we first jumped on the big-data bandwagon, really thought that it was going to be a very fast process, and we've learned that it's a much slower process. But I'm still very optimistic and excited about all of the data prospects that are currently in the hands of private companies, and I'm still working with Indeed today.
So what I'd actually like to share on this panel is a little bit of my perspective of this other feature of private data, which we haven't talked quite as much about. We've focused on the value of the individuals; we've focused on the values to individual companies. But it's still about the individuals. And then we focused on perhaps sharing that information about the individuals across companies. But there's also the value of this more aggregated data, and how it might be used for these big public insights around economic conditions, social trends, and that sort of thing—the sort of thing that really excites social scientists, and economists in particular, in terms of access to this private sector data.
So, before I go further I'd like to talk about what I mean by "private data," because it means lots of different things to lots of different people. So for this talk what I'm thinking about is information about individuals that's already in the hands of private companies. When we think about that big data revolution, we had all of these different types of private companies start to collect different kinds of data on individuals. They're tracking our interests in different goods and services, and houses and spouses, and they're also keeping track of everything that we're looking at when we think no one is watching us, whether it be how we spend our time, our money, or our secret interests.
This is really all kinds of companies these days—not just the internet companies that we think of, but all kinds of companies are collecting this sort of information, and it's all sorts of different kinds of information. And it's really gone beyond just the usefulness of going into an individual product. It's really now this idea that there might be this "public good" aspect to it as well. Now, this means for my part of the panel, I am going to punt on who actually owns the data. I'm going to let my esteemed colleagues talk about that aspect of it. I'm really focusing on, while the data's in the hands of the private company, can we think about incentives to get that information into the hands of statistical agencies to be able to feed into existing and improving statistical releases from trusted government sources, as well as encouraging these companies to provide insights that they're getting from their data to the public at large?
So when we think about what kinds of information is being collected here, I think it's really important to realize the variety of information. Every action that in any way touches the internet today becomes data—every keystroke, every click, every hover, every scroll become data. This isn't just that simple PII, "Here's my email address, maybe you know my location." There's this incredibly complex web of information about what I looked at just a minute ago, what I'm going to look at next week, and how those things interact.
If you can connect that across data sources, those covariances are incredibly informative about what people are interested in, what actions they might take in the future. That's very commonly used for companies, for their individual products, whether they're enhancing their current product, or making new development of products. But it's also potentially useful for providing insights about where the economy is going, where society is going. and we could possibly learn more about the world around us—if this information were properly organized, gathered, and shared.
So that's what I'm thinking about when I'm talking about public insights. We're really hungry for more information about what's going on in the world. We have a sense that all this information that private companies are collecting might really be able to tell us something new about where we are right now, and where we might be going, maybe even in the near future—maybe even in the far future—in a way that past data just didn't have the capacity to inform us upon.
I really think that there are two different aspects of this, and I want to be clear to keep those separate. There's the idea of inputting this data, probably in a pretty raw form, into statistical agencies. This is already being done, but it could be done more, to enhance and expand the offerings from trusted government sources.
There's also a second piece that we're seeing a lot of as well, but it's at the discretion of the individual firms, in terms of reaching out and producing inside these insights, and maybe producing them in a not completely transparent way, but then sharing out different trends, insights, things that they're seeing. And we can see that the press, the stock market—we're all, all ears for this information. right? We see this new study on this, that, or the other, and it's using big data from a company, and we jump on and we say, "Hey, did you read this article about this new insight that we're getting?"—which suggests that there's just a ton of interest in learning about the world from this information, beyond just how my particular piece of information feeds in to getting me a better search return on Google.
I want to first focus on the promises of this private data for public insights. Admittedly, when economists like myself started digging into this data several years ago, we really thought we were going to find the Holy Grail. And these promises are real. The data is amazingly granular. The level of detail that we can get to in these sources of data, it's extraordinary. And it's really timely, and we can get these just-in-time insights where we can ask things like, what happened three hours ago? So one of my favorite examples was, when we were surprised by the Brexit vote, I was able to see in the Indeed data how job seekers were surprised by that based on their change in their job seeker behavior just three hours after the announcement of the result. We are able to learn things in a way that just might not be possible from other sources of data.
And they're innovative. We can talk about new measures of old things that we've long cared about, and also we can track all sorts of new things we didn't even know we cared about. I didn't know that I cared about how people scroll through websites before, but now I've realized it's really useful and informative about how people are interacting, with books. I love those little stories about how long people are spending...how far into a book people get on their Kindle. I didn't know I cared about that, but now I do.
These data also, they're observational most of the time rather than surveys. So we get less talk, more actions from these data. That's incredibly important, especially now that we're seeing research about the challenges of getting survey responses in today's society. Just the value of the difference between what people say they prefer versus how they act like they prefer. There's the big hope—we've already talked about it somewhat—that these data might be substantially less expensive than traditional approaches to collecting information, because we would just be piggybacking on existing data collection.
And so, we go in and we see the potential of this Holy Grail—and I should just note that I was not kidding when I said I thought we were going to go in and prevent the next recession. I was actually part of a South by Southwest panel back in 2016. It was me and a Google economist and a person from a satellite company, and our session was really called "How Data Science May Prevent the Next Recession." We were really, really optimistic about it, and I still am really optimistic about it. But I think we've also had to become a little bit more realistic and recognize that the world we're in, it's not an Indiana Jones movie. This is real life, and real-life archaeology requires a lot of digging into the data. It requires a lot of piecing together of different things, where we don't even know what we're piecing together as we're putting it together. It's still incredibly valuable, but it takes time.
Elliott: Speaking of which: three minutes.
Sinclair: Three minutes. I'm with you. So when we think about the challenges in working with private data for these public insights, I think the big one that we run into—and I think several people in this room have experienced this in various ways—is this difference between what the business cares about, and what the public would like to see. Companies have their own incentives. That's something we as economists have been talking a lot about in this room today. It's thinking about having the right incentives to get the information that is really valuable to us as the public from the company. We've already talked a little bit about the importance of verifiability of what the data actually reveal. Companies may not have the incentive to be clear as to exactly the fine methodology of some of their data products. Transparency is a challenge.
These data are also rarely representative. If you're drawing insights from just your user base, that might not be representative of any particular population that would be of interest for public insights. And trying to think about how we properly weight or benchmark that can be a challenge—particularly if we're trying to build a new indicator. We also face a lack of history or consistency or compatibility of this data with any other existing data. We have constantly changing definitions.
One that I think is really important, that I think it's really hard to understand unless you've actually worked with this data is even though it's collected, that's very far from usable. It's incredibly costly to take the unstructured data and structure it into things that we're used to working with as researchers. That's something that, again, goes back to the business incentives. The company doesn't have the incentives to structure the data in a useful way—it's too costly for them to just do it out of the kindness of their hearts.
So to wrap up, I think just to follow on to some points Alessandro particularly made in his paper about the importance of nuance: he was in part emphasizing the different types of regulation, and how there's a lot of different types of regulation. Well, there have to be a lot of different types of regulation because there's a lot of different kinds of data. Whether we're users or producers or potential regulators of the data, I think it's really important to recognize that there are lots of different potential uses of the data, and that a one-size-fits-all approach is likely to not fit anyone very well.
Just some key suggestions, to wrap up: I think it's important to talk about privacy separately from data protection. Privacy is really about a legal issue. Data protection is about a security issue. When we think about aggregation, companies are often aggregating data before they share it out to protect themselves—and also because that provides the aggregate insights, but that also builds in assumptions and biases, and so we need to think carefully about that. I also think it's important to talk about the use case that the provider of the original individual data thought they were feeding into, versus how it's actually going to be used.
Of course the big lesson that I've emphasized here is that patience is going to be required to work with these data. They have huge promise. The statistical agencies have had decades to develop the quality data that they provide today, and so it's still early days in working with these data. And I feel like it's always important to wrap up one of these sorts of discussions by emphasizing that the trusted government data is what we should always be thinking about as our benchmark, because new sources of data are complements, not substitutes, to that hero's work that the statistical agencies are doing for us.
Elliott: Okay. thank you, Tara. Jim? [Applause]
James Stoker: All right. thank you very much. Never in my life have I felt more aware of the fact that I am not an academic [laughter]. Twenty years ago, I was for an incredibly brief period of time, but I have not been since. My approach, my perspective, to this problem I think is pretty radically different from most of the ones that we have heard up to this point. As mentioned earlier, my current position is the head of financial crimes, data and analytics—so fraud, AML [anti-money laundering], that sort of thing.
When I think about data privacy, the two things I think about are making sure that I do not break the rules intentionally, and then making sure that I—and I guess "I" being my broader team—just not break the rules unintentionally. I think less really about what the specifics of the rules are, in general. I don't think about what the optimal rule set would be. In many ways, what I view my goal as being is trying to minimize cost, effectively—given that there is a set of rules that we will follow in spirit and in letter, both regulatory rules, internal rules. How do I follow those without doing damage? And by "damage" I mean damage to accuracy, my ability to do the things that I am trying to do within my task, whatever it might be. How do I do that as well as possible?
Now, to give you a little bit of background before I start, first let me say I do work for a bank, not a university, so part of that is... Let me make sure it is clear that I am not speaking for SunTrust Bank, I'm just... These are my opinions. I might work for SunTrust, but I'm not representing SunTrust here. And I really do want to emphasize the perspective that I am taking in terms of thinking about data privacy is that of a user—and honestly, more specifically, that of an analytic user. My career has been made in converting financial and personal data into business decisions, and the simplest theory there is, the more data you give me, the more accurate it will be.
If you were to think that through, you would come to the obvious conclusion that I would be opposed to any regulation around data privacy, but that's not true. And you know, a) part of it, just to state the obvious: I work in financial crimes now. There is not a bigger supporter of regulation around data privacy than me. If anybody loses their financial data anywhere, eventually they're going to defraud me, and that makes me unhappy, that makes my life harder.
Realistically, even prior to that, in all the other roles that I had at the bank, these were things that I didn't work in opposition to. Part of the reason I do not work in opposition to this is... I'll tell you my one academic story. This is one of the few things that I remember from being in graduate school. Spent 90 minutes in a class—and I'm trying to remember who the professor was, but it's just losing me at this point. I do not remember, I remember the class. Asking the question of, here is a macroeconomic model, here is proper specification: take out one of the data fields from this proper specification, work your way through. How good are the estimates? And you grind and you grind and you grind, and you get to the end, and you come up with the answer: they're wrong. And you can kind of go into the question knowing this, but certainly by the end you had proven the point.
Importantly was how wrong they were. They were just immensely wrong. It wasn't like they were plus or minus 5 percent—they could be anything. I have spent the last 25 years working in banks, working in finance, convincing myself I don't believe that. Part of this is my belief as to what modeling is. Modeling is an imperfect science. There is no perfect model that we are searching for when we are trialing a model to predict, say, a fraud or something along those lines.
Even if there was, the amount of time that it would exist would be so sort of ephemeral that you would never get enough data to actually be able to parameterize it. Things change too much. So you're always working in approximation. The thing that sort of struck me most by Tara's comments before was—and you know what I was thinking, honestly, the whole time—was how operational data is. Having good data is important.
You know, a joke I would tell in thinking about data privacy and data privacy regulation internally is, maybe you tell me, "Okay, we're going to limit the data you can use, from 95 percent accuracy down to 90 percent." How would I feel? Honestly, I would be overjoyed that I had data that was 90 percent accurate. Particularly once you start using big data, and you start using massive data fields, the process of cleaning this data is incredibly hard. The margins that we're working on—I mean, I wish they were so fine, but I don't think they are.
The thing actually that I will take from this entire day, that I'll remember the most, was—and I really enjoyed it, and I appreciated it and everything—John's paper earlier about effectively gumming up the data a little bit, and what is the implications on the accuracy of the data. I've never tried that, because I don't want to start from a position of using data that I am not allowed to use, and see what happens if I take it away. I mean, just the danger around that would be overwhelming.
But intuitively, it made a great deal of sense to me, which is that by using data well you can mitigate the damage done to the data by protecting privacy in ways that is actually meaningful. If I were to make two points... And I, if I can figure out how to get to the next one, and I knew I wasn't going to get through this so I'm just going to sort of start at the end.
If I were to make two overarching points about this, one is a feeling that, from a data user's perspective, things like data privacy rules are not oppositional. We're not against you, in any sense. We want these rules. I mean, obviously in my current job, I really want these rules. But even prior to that, there's nobody more sensitive to the damage that can be done through data privacy than people who make a living working with it. We are supporters of data privacy. We understand, yes, it will take a little bit of accuracy away from us, but honestly, I don't think it's that much. I think if handled well, the damage done, the lack of precision as a result of these, is actually relatively minimal.
If I could, I just want to give you one example—and this obviously an immensely simplistic example: I was talking to a person who manages a fraud platform for me. His job... I should say, one way business is different from academics: pace. We try to answer problems, often in the hours. so in a sense, it ain't gonna be perfect. But if somebody ripped us off last night, we don't want them to rip us off tonight. We need to solve it now, so speed is of the essence.
I wanted to get his opinions on data privacy before I came here, because he's a smart guy and I thought he'd have a lot to say. You can see this look on his face of joy, where he thought back about one of the areas in the bank he was in that had this beautifully well-curated data set, and it had already been date of birth. And he was just so happy. Part of it was, I'm not allowed to know their birthdates—I don't want to know their birthdates, because this is, obviously, very secret information. Don't give people your birthdate, because if they have it they can find out a lot of other things about you. So it's important to protect it.
But as much as that, they had done a part of his job for him. He wanted to bend the data, right? I mean, for those of you who are sort of practicing data people, you bend data. Everything doesn't have to be continuous. So they managed to prevent an unintentional error. The fact that he would have to get data, that could put him at risk while actually speeding up the process. And again, in the world I live in, speed is important. Those two things together were terrific. But the person who curated that data set managed to protect data privacy and improve the accuracy of the work that we were doing.
I don't think this is a rare example. However, it points to a challenge, and this is what the rest, my point two, would be, is that effective data privacy, effective data security at a bank, it's not one person's job. I mean, I know we have a person whose title is, they are the head of data governance, and they have a team of people underneath them—I'm going to say something, by the way, that trust me, I know is sort of blindingly obvious, but there is a point: everyone is responsible for doing this work. To do it well, it has to be something that is a part of everyone's job.
And this just doesn't mean the obvious, which is if you are a teller and you call up somebody's account number, don't walk away as the account number is showing. That's pretty clear. Don't print stuff out and then leave it on the printer if it has private information. This also counts for me, and I think what I've learned in the last four weeks of thinking about this stuff is exactly that: when the jobs that I do, which often are around org design—I build teams to manage certain analytic problems—how do I build those teams? And to use banking terms, it's the segregation of duties: who does what, within those teams?
If you went back 15 years at SunTrust—and I guess now I'm sort of down to my last point—if you went back 15 years at SunTrust, when I ran a team around doing some capital work, it was heavily quantitative capital work, I managed the process from the first data point, from point data from operational systems, putting those data on to another system, combining the data, transforming the data. I did math, and then I would run the data through that mathematics, and then I actually printed the reports. I did all the reporting that came out of this as well. I own the thing, end to end.
It didn't seem strange to me at the time. We would never do that today, because of the amount of potential, even if everybody is best spirit, fully intending to do the right thing, the potential risk there is overwhelming. We could have made mistakes. No one would have ever been able to capture those mistakes.
If I have a general sort of like a mantra when I put these things together, put these organizations together, it's that analytics is a part of a business process. Don't treat it as a business process end to end. treat it as one link in a chain. The pulling of dating and constructing data assets is a separate piece. You want somebody to manage that piece separately.
They can't do it on their own, though, because to build that piece properly, you have to know how the data is going to be used—in this case, the data is being used as an analytic asset. the analysts have to work with them. So you need... This is a case where the analysts, on a day-by-day basis—the people who are constructing the data on a day-by-day basis as a part of their job—need to think about, "How do I do data security? How do I do data privacy?" If you do, you can build data assets that satisfy the regulations and also improve the analytic use.
The last thing I would mention—and this is more just purely on a construction, "how-to-do-this" perspective—there's always going to be a trade-off between technical and nontechnical controls (and I will finish soon, I promise). The technical controls are stronger. When I say "technical controls around data privacy," I mean just don't let people have it. If you want to ensure that people do not do damage with data, don't give it to them. That can be incredibly effective.
Obviously, the damage to this is, they can't do anything with it, and financial institutions, like so many other businesses, are fundamentally information businesses—we need the information to survive. Think about my current role—you want me to have every piece of data about you if you are a SunTrust client. You want me to know your social security number, because you want me to know if somebody else is using your social security number. There are jobs that demand that we have this kind of information.
You can solve 90 percent of this with technical solutions: access control, the development of data assets—thinking hard about what data you allow people to see and how do you want them to use it. This is incredibly challenging. I think it is impossible to do this outside of a well-defined data strategy. I think fundamentally you're better off thinking of data security as a part of data strategy, broadly, than as a separate institution. If you do that well, then technical controls will get you there.
To the extent that you cannot, you need to have policy, and policy is just rules: "If you do this, we will fire you. Please do not do that." You will never fully get away from that. An observation: it makes sense to me that data is centered in technology fields, but it always needs the help of more traditional operational risk because you'll never get the data assets to a point where they can manage this themselves.
Elliott: Thank you very much [applause]. I've been a softie, so we're running a little bit behind. I will skip any questions from myself, but we've already had a number from the audience. Please do use Pigeonhole, if you'd like to provide more.
Let me start in almost random order here. Tara, there's a question for you, which was: It was said earlier—and I think I can even pull this up—it was said earlier that 90 percent of data was created in the last two years, so doesn't that create a recency bias?
Sinclair: Oh, absolutely, yes. I think that's going back to my point about history changing, definitions changing, what we're measuring. There is a huge recency bias, and that's fine for most business processes. They really are also recency biased themselves. They want to make sure that they're measuring now and the future well. The past: maybe it matters occasionally for thinking about certain models, but they're absolutely willing to give that up in most cases in order to have the best metrics today. Even without the recency bias, in terms of looking specifically at private sector data, there is the recency bias in terms of the incentives of what the businesses store and process, and even how they structure their data.
It creates real challenges. I think these are challenges that statistical agencies in particular are prepared to face, as they're using this data as an input. But this is why agencies like the BLS—it took them three years (almost) to bring in data from a private source into the CPI calculations. it takes a long time to figure out the statistical aspects of the new data.
Elliott: And Alessandro, you had a comment?
Acquisti: Yes, I agree with her. I'll offer also an example from the world again of online advertising. There is research suggesting that contextual ads—the kind of ads, for instance, that you see when you are searching for a product on Bing or Google, and you see in addition to the organic results you may see on top also the sponsored search results—are much more effective than behavioral display advertising. Behavioral display advertising, maybe based on your clickstream behavior, are accumulated by some third-party companies for weeks or for months. Contextual advertising is based on the need you have in that very moment. Perhaps not surprisingly, those ads have a much higher click-through rate than display ads.
So the point, going back to the regulation: sometimes data retention regulation—so regulation which imposes limits on how long you can keep the data—are seen very negatively. But it is once again context-specific, whether having data going back 10 years is useful or just creates an enormous risk to your company, or whether in fact you can rely on data which is very fresh. It changes, I believe, dramatically from scenario to scenario. I wouldn't give the same answer for health data that I would give for advertising click-through data.
Stoker: It seems like there just a general rule, it's sort of horses for courses in data. just make sure you understand if the data actually captures the information that you want. Iif you're looking for something which only happens annually, having a huge number of points from one year doesn't do you a whole lot of good, because you really only have one data point. It might be terabytes of data, but it's really only one data point.
Elliott: This is such a natural follow-on that I will go with this one, which is, isn't there a danger of getting lost in this massive amount of data...losing the ability to form hypotheses, and all of that? I assume from what you just said, Jim, you probably would agree that's a risk?
Stoker: Honestly, I think this is about the most interesting topic that is going on right now in my space, which is there are sort of two opposing approaches to solving problems. One is fundamentally the way I did it, which is hypothesis first, data to test hypothesis. The other is the way that people half my age seem to approach it, which is I'm just going to answer it. You give me the data, and I will tell you. And I think that... I'll give you the wishy-washy answer: probably the best place is somewhere in the middle. I don't think you ever want to lose sight... The work we are doing now, we start with the "just answer it, massive amounts of data, no hypothesis" because to me often what that does is, it's sort of a fancy way of doing almost like "get the mean of a data set." It tells you what the data says. It doesn't do anything else, but it can do that very rapidly and incredibly well.
But you never want to stop there. It would be a mistake to take that and then just take the best, by some measure of best, and plug it in and run it. What you need to do is take that and then filter it through a secondary process that says, "What does this mean? Intuitively, what is reasonable?" This isn't wholly different from traditional modeling, but it's kind of "step two is step one, step one is step two now." But I think that the combination of that can give you very fast results, which I actually have comfort in.
Sinclair: There's a huge difference here between how economists think about things versus how data scientists think about things. I think this is exactly that tension there, is that data scientists—I agree with you—they bring great tools and analysis of the data, but we also need economists to be thinking about causality. I do think a lot of these statistical issues are kind of in between, that both parties are responsible for making sure to think about whether it's really representing the population that we're talking about as the insights go out.
Stoker: Actually, if I could get another one in, and I apologize, it's just, it is such a good one. Data is—you pick data, you choose data. And "feature engineering" I guess is what we call it now, but that is a massively important step in a modeling process. We can kind of ignore that at some level, if we think of modeling as being math—but it really isn't. the first step of modeling is data. Somebody has to decide what data they're feeding into the system, and the answer is never "all of it." It just doesn't work that way. Just from us, there processes too much. A lot of the data is for...
Elliott: All right, this was an interesting question, because I think to some extent you directly answered it, but obviously there's a difference in view. They pointed out: it seems like your conclusions about data and privacy were different from the conclusions of the paper from the previous panel, which seemed to largely say, "Better to have more sharing."
Acquisti: So I would go back to the point... My belief, based on the extensive literature in this area, is that there is no single correct answer. I wouldn't see some of the results I presented in my presentation—which, by the way, were not related to my own research, but they were related to research by others—as contradicting the work Chris was presenting, if that's the reference there, the anonymous person asked about. Because we do have, certainly, cases where privacy can decrease aggregate welfare, and we have cases where the lack of privacy can decrease welfare. For instance, some of the scenarios I mentioned, and studied by others in the literature, are cases where in the lack of regulation, there is over-incentive to collect data, and this data then causes costs to individuals, and once you aggregate together all these costs, they end up offsetting the societal benefit of the data itself.
As for the anonymizing part of the question, I find that very interesting. I will note that, in my view, they are two distinct issues, meaning that there is literature which—economic literature—which shows that privacy protection can be welfare enhancing, but does not necessarily rely on anonymization of data. But there is also work which points out that "we may have the cake and eat it, too" by anonymizing certain data, so that it can be shared with researchers, shared across companies—so maybe various autonomous car driving research companies can all get better together without actually violating individuals' privacy, if by "violation would lead to less privacy," we're referring to identifying data of this particular driver, using this car, on this particular street.
Elliott: Okay, thank you. Another question for you—I'll pull this up—which is, how might policies nudge consumers toward the optimal privacy protection versus sharing?
Acquisti: That's a super good, and super tricky/evil question almost, because it's kind of...
Elliott: This is what you get for running over time [laughter].
Acquisti: Oh, so you wrote the question [laughter]! Is it possible to show it on the screen?
Elliott: Yes, I'm sorry. I thought I had hit that. Let me try that again.
Acquisti: I think I get the general idea, to do a... So, it's an evil question. I say this with a smile, with my congratulations and thanks for the person who wrote it. It's a super-tricky question because it assumes a) that there is indeed an optimal amount of privacy protection, and b) that we—"we" being the corporation, the government, or the policymakers, or the president—can know what that level is, and then influence people in that direction. So, the risk: we have to clarify the context of the discussion here.
You can take a completely rational approach to price decision-making, and believe that consumers have enough information about what happens with their data, they're rational, therefore their revealed preferences in a marketplace accurately reflect their real privacy stance. Or you can take a view where asymmetric information, behavior cognitive biases, make it so that revealed preferences don't always capture "true desires." The point of soft, paternalistic approaches, or nudges, is to try to move individuals towards what they claim to be their preferences. So far, so good?
Now, my view on this is that when we do studies on nudges, we try to see first of all whether they are effective at all. We take a complete, non-normative approach, just a positive approach— "positive" in the economic language of objective—so we want to see whether they work. And the evidence is mixed. Some nudges work, others don't. And then we let people use these nudges as they want, if they want.
In some cases, you can see paternalistic governments using, instead of strong paternalism such as regulation, using soft paternalism, or gamifying content—gamifying, forcing companies to change interfaces—changing, for instance, the default visibility settings, or the opt-in versus opt-out choices on the website—is a form of soft paternalism. You're changing the architecture of choice; you're changing the interface. You are not forcing individuals to make a decision or the other, but you are nudging by changing, for instance, from opt-in to opt-out, you are definitely nudging people in a certain direction.
Let's say that even that is considered too aggressive, because who says that opt-in is better than opt-out? Well, there is some research, economic research, suggesting in some cases we know:, not always. Then a way to use nudges is to, say, to help people, for instance, self-commit to future behavior. If you will say, "You know what? I really would like to stop revealing so much information about myself on Twitter." Well, we can create a technology that when you are revealing something that you have decided, ex-ante, that is too sensitive, that little interface—you know, your privacy agent—tells you, "Alessandro, are you really sure you want to share this?" And then you decide. You decide whether to continue or not to continue.
In that case, that soft, paternalistic approach is based on your own volition rather than it being "imposed on you" by government, or corporations, or whoever may believe that they know better than you, your actual preferences.
Elliott: OK, thank you. So one for Tara, which is, you stated that private data can be incorporated into traditional macroeconometric modeling to enhance monetary policy decisions. Can you give an example?
Sinclair: Oh. Let me be clear. I think that it has the promise of being able to do that. We are not there yet. This is the dream. This is not yet the reality.
Elliott: So what do you dream about?
Sinclair: Okay. so let me... Thank you for letting me share my dream [laughter]. There are some ways that central banks are incorporating various kinds of private data. Google Trends has been used for some production models of unemployment and inflation—to mixed results, depending on whose papers you read. It's again, there are papers for, there are papers against.
But it is being used, and I think using more specialized data... So the data that I'm looking at is labor market data, and I think it has the potential for providing more real-time information about perhaps not how tight the labor market is, because that's actually a hard one, again, because of representativeness—people coming to one particular website, it's very hard to separate out shifts in market share from economic shifts.
There are other factors... So I'm looking at research on mismatch in the labor market, which might actually be more representative of a metric from a single website, and I think that might still have relevance for thinking about where we are for economic conditions, and therefore potentially for monetary policy. But there are lots of other projects that are going on using all sorts of different kinds of private data, whether it be credit card transactions or the Zillow housing data. So there are lots out there that's being put together, and it's coming in all sorts of different forms, because some is coming in raw, directly to the Federal Reserve or to government statistical agencies. Others, it's being processed by the companies and then shared.
So going back to the anonymization point: this is where the verifiability of that data is a challenge once it's anonymized. So sometimes statistical agencies really only want the raw data because of the verifiability. But maybe companies have less incentive to share that because they think of that as more proprietary, more relevant for their business decisions, and they only want to share the aggregate data. And so those are some of the intentions of sharing the data there.
Elliott: Alessandro, you had a comment?
Acquisti: Yes, one quick comment, extending what Tara was mentioning, because your reference to job market matching is very important, and allows me to provide another example to the previous—next-to-previous, actually—question on how our results—Chris, and some of the results I was referring to—can be both simultaneously correct.
Social media, in the context of job market search, can have a positive effect. It improves matching between employers and employees, and there is research showing that. It can have also a negative effect. Some of my own research shows how certain types of discrimination, which are no longer possible, or much harder to do, in the United States, due to employment laws—so, discrimination based on things like religion, or sexual orientation, et cetera—can still happen because employers find on social media the personal information that people freely reveal. So there is legislation protecting that. Employers can find it because we ourselves have revealed that information. In that case, we have discrimination, which is a bad effect.
So simultaneously, social media creates this positive effect and this negative effect. How do you combine it to get... Do we really believe that we can set up an objective function in which we put the two together and we try to estimate the optimal? Well, that theoretically is possible, but it would require a policy decision, which is, how do we weight the importance of job market matching, and how do we weight, as a society, our need to fight discrimination based on protected trades?
That's a policy decision, that's not just an economics decision. Economics can help frame the problem, but I don't believe it can offer ultimately that answer. That's why results showing the positive and the negative of regulation can be simultaneously true.
Elliott: I think that's a great comment. I'm not sure it counts as "super quick," I might add [laughter].
Acquisti: Sorry. Super quick for a professor [laughter].
Elliott: But I'd like to slip in one question of my own, which is, one of the things that's difficult for policymakers and politicians on this is the wide gap we've already talked about between what people say they value and how they actually operate. What's your sense of, one, do you have a feeling as to how this will migrate? Because it's hard for me to believe that there will continue to be this big a gap. Eventually I think people will either start admitting how they actually operate, or they'll change how they operate.
Do you have a sense of how that might change? And related to that, do you have any suggestions for how policymakers should deal with the uncertainty of not knowing what their citizens actually want?
Stoker: One point is, I think a lot of the confusion there comes from the fact that people don't have any idea what they are doing. These are really complicated topics, and I don't know what information is getting lost in my computer when I use it—and I'm in the industry. In some sense I should know this. An obvious thought is, education could help here, finding some way to actually make the choices we are making more transparent to us so we have a better idea as to what it is we are doing. I think that might help bring things closer together.
Elliott: Either of you?
Sinclair: Go ahead.
Acquisti: Economists tend to think about this as one side of the spectrum—there are revealed preferences, what we do in the market—and the other one is self-reported preferences. In my view, neither is actually truly indicative of underlying preferences because preferences actually don't exist in a vacuum, fixed in some platonic world of preferences. Preferences are constructed in the act of deciding, in the environment in which they decide. They are affected by the architecture of choice, by the interface you are facing.
So I agree with James, that the reason why we have this huge dichotomy between one and the other is that there are problems with asymmetric information—not knowing what you're really doing, not knowing what the possible solutions or consequences are. If you know, sometimes you may actually feel helpless in terms of actually taking a stance, looking for protection. You may not know that there are solutions, and if you know, there could be behavioral biases such as immediate gratification bias, or hyperbolic discounting, which push you to push the action of protecting to the future because you want this immediate benefit from sharing now.
So there are so many different factors which may explain this dichotomy, and whether it will eventually be resolved, I'm personally not optimistic that it will. I feel that it is something that we'll just have to learn to live with.
Elliott: Tara, any comments? OK. We are exactly at our time. Thank you all, and thank you to the panelists [applause].