predict foretell forecast reading room: Philip Tetlock

https://en.wikipedia.org/wiki/Philip_E._Tetlock

You can go to the original (textise.net) URL

https://www.textise.net/showText.aspx?strURL=https://medium.com/conversations-with-tyler/philip-tetlock-tyler-cowen-forecasting-sociology-30401464b6d9

or read my copy, cut & paste

Philip E. Tetlock on Forecasting and Foraging as a Fox (Ep. 93)

Mercatus Center

Apr 22, 2020 · 37 min read

•••• ••• ••••

IARPA original forecasting tournaments with geopolitical events, like how long the Syrian Civil War would last, or what would Russia do in Eastern Ukraine, or things of that sort

•••• ••• ••••

TETLOCK: I don’t think so. I don’t think I have the patience or the temperament for doing it. I did give it a try in the second year of the first set of forecasting tournaments back in 2012, and I monitored the aggregates. We had an aggregation algorithm that was performing very well at the time, and it was outperforming 99.8 percent of the forecasters from whom the composite was derived.

If I simply had predicted what the composite said at each point in time in that tournament, I would have been a super superforecaster. I would have been better than 99.8 percent of the superforecasters. So, even though I knew that it was unlikely that I could outperform the composite, I did research some questions where I thought the composite was excessively aggressive, and I tried to second guess it.

The net result of my efforts — instead of finishing in the top 0.02 percent or whatever, I think I finished in the middle of the superforecaster pack. That doesn’t mean I’m a superforecaster. It just means that when I tried to make a forecast better than the composite, I degraded the accuracy significantly.

•••• ••• ••••

TETLOCK: I think you’d want an interdisciplinary team. Diversity is one of these words that’s been reduced to a cliché, but we have found in our work that cognitive diversity helps, and it helps in certain quite well-defined ways.

If you want to create a composite that out-predicts the vast majority of the superforecasters, a good way to do it is not only to take the most recent forecast of the best forecasters in the domain, but it’s also to extremize that forecast to the degree that people who normally disagree, agree with each other. And when you have convergence among diverse observers, that’s a signal that the weighted average composite is probably too conservative and you should extremize.

COWEN: Does the diverse team have a CEO, someone in charge?

TETLOCK: Not in this case, no. That’s done purely statistically.

•••• ••• ••••

TETLOCK: That’s a matter of managerial skill, I would say. Going back to one of my old dissertation advisors at Yale, 40 plus years ago, Irv Janis on Groupthink, I would pick a leader who knows how to shut up, and not reveal opinions at the beginning of the meeting, and knows how to listen.

I would pick a leader who knows how to shut up, and not reveal opinions at the beginning of the meeting, and knows how to listen.

•••• ••• ••••

TETLOCK: I think a lot of good executives have the intuition that you get more out of a team of forecasters or problem solvers if you elicit independent judgments initially that are uncontaminated by conformity pressure, and then you create an environment in which ideas can be freely critiqued before lifting the veil of anonymity and letting people see who’s taking which positions.

•••• ••• ••••

TETLOCK: IARPA just ran a forecasting tournament called Hybrid Forecasting Competition in which they pitted algorithmic approaches and human approaches and hybrid approaches against each other. I should let IARPA speak for itself about how well its programs work or don’t work.

I don’t think there’s a lot of evidence to support the claim that machine intelligence is well equipped to take on the sorts of problems that the intelligence community wanted to have answered when it runs the forecasting tournaments it’s been running with our research team.

The things like the Syrian Civil War, Russia/Ukraine, settlement on Mars — these are events for which base rates are elusive. It’s not like you’re screening credit card applicants for Visa. Machine intelligence is going to dominate human intelligence totally. Machine intelligence dominates humans in Go and Chess. It may not dominate humans in poker.

I don’t know that the state of the art is quite there yet, but it’s in their StarCraft or whatever the next thing that Dennis Hassabis is going to conquer. No, I don’t see evidence that those approaches work in the domains that we study with the intelligence community.

•••• ••• ••••

COWEN: What if I say — again, current events aside — but it seems easier to predict. There hasn’t been a world war since 1945. There’s been steady economic growth in most parts of the world. More peace. Isn’t that easier to predict? You just predict 2 to 4 percent global economic growth and you pick up a fair amount of what’s happened since 1950.

TETLOCK: Indeed. So, simple extrapolation algorithms, historically, are hard to beat. We’re running a COVID-19 mini forecasting tournament right now, and they’re proving to be hard to beat. The skill, of course, is when to alter the trend, whether to accelerate it or decelerate it, or to change direction.

•••• ••• ••••

TETLOCK: I think humans have been repeatedly humbled in competitions against simple statistical algorithms, going back to Paul Meehl’s famous little book on clinical versus actuarial approaches to predicting in medicine and psychiatry. So, I would say, “Be humble.”

COWEN: There’s some of your early research that, if I read it properly, suggests that making people accountable leads to more evasion and self-deception on their part. Are you worried that your work with pundits, by trying to make them more accountable, will lead to more evasion and self-deception from them? How do you square early Tetlock and mid-period-to-late Tetlock?

TETLOCK: [laughs] It’s actually not too difficult in that particular case. It really depends on the type of accountability. Tournaments create a very stark monistic type of accountability in which one thing and only one thing matters, and that is accuracy. You get no points for being an ideological cheerleader and pumping up the probabilities of things that your team wants to be true or downplaying the probabilities of the things your team doesn’t want to be true. You take a reputational hit. So the incentives are very unusually tightly aligned to favor accuracy.

That’s extremely unusual in the social world. Most forms of accountability occur in organizational settings in which there are lots of distortions at work. And the rational political response for a decision maker located in most accountability matrices and organizations is to engage in some mixture of strategic attitudes shifting toward the views of important others or, as you put it, evasion, procrastination, and so forth.

•••• ••• ••••

TETLOCK: I don’t know if we can make them run much further away from objective standards than they already are. [laughs] But that’s a very interesting point about forecasting tournaments. I look at the kinds of people who are attracted to participate in them. At the very outset, I invited lots of big shots to participate in forecasting tournaments, and they’ve repeatedly turned me down.

I had a very interesting correspondence with William Safire in the 1980s about forecasting tournaments. We could talk a little about it later. The upshot of this is that young people who are upwardly mobile see forecasting tournaments as an opportunity to rise. Old people like me and aging baby-boomer types who occupy relatively high status inside organizations see forecasting tournaments as a way to lose.

If I’m a senior analyst inside an intelligence agency, and say I’m on the National Intelligence Council, and I’m an expert on China and the go-to guy for the president on China, and some upstart R&D operation called IARPA says, “Hey, we’re going to run these forecasting tournaments in which we assess how well the analytic community can put probabilities on what Xi Jinping is going to do next.”

And I’ll be on a level playing field, competing against 25-year-olds, and I’m a 65-year-old, how am I likely to react to this proposal, to this new method of doing business? It doesn’t take a lot of empathy or bureaucratic imagination to suppose I’m going to try to nix this thing.

•••• ••• ••••

TETLOCK: Well, I think that particular measure in integrative complexity does have some correlation with forecasting accuracy.

But I think you’re picking up something more than just forecasting accuracy. You’re picking up what I call value pluralism. You’re picking up a tendency to endorse values that are often in conflict with each other. So the more frequently that you, as a political thinker, confront cognitive dissonance between your values, your value orientations, the more pressured you are to engage in integratively complex synthetic thinking.

When your values are more lopsided, it’s easier to engage in what we call simpler modes of cognitive dissonance reduction, like denial or bolstering and spreading of the alternatives. So downplay one value, push up the other value, and make your life stress-free.

But some ideological positions at some points in history require more tolerance for dissonance and some people are more inclined to fill those roles.

•••• ••• ••••

TETLOCK: We’re not talking about huge effect sizes here or integrative complexity. I think that fluid intelligence is probably a more powerful predictor. A combination of fluid intelligence and integrative complexity, I think, does boost up forecasting accuracy, yes.

•••• ••• ••••

TETLOCK: Well, that is supposed to be the division of labor here. I think one reason they may be interested in forecasting tournaments is because forecasting tournaments incentivize people to do one thing and only one thing, and that’s accuracy. You don’t get points for skewing your judgements toward your favorite cause. You take a hit in the long term by doing that.

•••• ••• ••••

TETLOCK: Whoa, boy! [laughs] Well, there’s a long history to efforts to reform the CIA. It was founded in 1947, and there have been various efforts since then. People have been unhappy with the CIA for many reasons over time, Vietnam being a big one. But there are lots of other reasons that people have expressed unhappiness. Both liberals and conservatives have, at various points, been unhappy with the performance of the intelligence community.

In 2001, it came to a crisis point, I think, and then there was a commission to reform intelligence analysis. After the WMD fiasco in Iraq, the pressure grew even more. One reason why we’re even talking right now today, is that the intelligence community was forced, essentially by the recommendations of the reform commission, to take keeping score more seriously, to take training for accuracy and monitoring of accuracy more seriously.

That’s when they created the Intelligence Advanced Research Projects Activity, which is the R&D branch housed within the Office of the Director of National Intelligence. Its job is to support innovative research that will improve the quality of intelligence analysis. That can be defined in various ways, but accuracy is certainly one very important component of that. But it’s not accuracy with a liberal skew or a conservative skew. It’s supposed to be just plain, “just the facts, ma’am” accuracy.

•••• ••• ••••

TETLOCK: That’s a very difficult question because of the tail risk aspect to it. If you look at the number of people who’ve died from terrorism versus other causes, it would seem that the amount of money we spend on suppressing terrorism would be disproportionate. But the tail risk complicates that a lot.

•••• ••• ••••

COWEN: On historical counterfactuals, by what year do you think the ascent of the West was more or less inevitable?

TETLOCK: Well, I have inside information here. We did a survey of some very prominent historians. We reported it in that book, Unmaking the West. We put a part of it anyway, also in an article in American Political Science Review. If you looked at just the unweighted average of judgments from the median, I think was probably around 1730, 1740.

•••• ••• ••••

TETLOCK: That’s a good example actually, of why I’m not a superforecaster. I don’t play Civilization V. But Civilization V was one of the simulations that IARPA chose to feature in its counterfactual forecasting tournaments under the rubric of FOCUS. And if any of your listeners are interested in signing up to be forecasters for FOCUS. We still have one more round, round 5, and we will be recruiting people.

But again, you see, it reflects my temperament. I don’t have the patience for a game like Civ V. Forecasting is inevitably a mixture of fluid and crystallized intelligence, and you have to invest a lot of energy into mastering a game like Civilization V. And I suppose, as people get older, they may become less likely to make those kinds of cognitive investments. It becomes more and more essential as I get older, I think, to focus on the things where I have a real comparative advantage.

•••• ••• ••••

COWEN: The best chess players are all young, right? This we know. There’s clear data.

TETLOCK: Well, yeah. It’s interesting the domains in which child prodigies emerge: music and chess and math.

•••• ••• ••••

But you can’t rerun history. So that makes counterfactuals a place where ideologues can retreat. They can make up the data, make up whatever facts they want to justify pretty much whatever. No matter how bad the war in Iraq went, you can always argue that things would have been worse if Saddam Hussein had remained in power. So you have these factual and counterfactual reference points that people use in debates implicitly to make rhetorical points.

Part of what attracted to me to counterfactuals was (A) how important they are in drawing any lessons from history, (B) how important they are in policy arguments, and © how unresolvable they are.

One of the things we’re hoping to do in the FOCUS program is to develop some objective metrics for identifying people and methods of generating probabilities that produce superior counterfactual forecasts in simulated worlds in which you can rerun history and assess what the probability distributions of possible wars are. Well, it turns out, you get World War I 37 percent of the time even if you undo the assassination of the archduke. And you get something like World War II . . . You see where we’re going.

So we’re hoping that one result of FOCUS will be to help us identify people and methods that generate superior counterfactual forecasts in domains where there is a ground truth. The next task will be to connect superior performance in simulated worlds to superior performance in the actual world.

And this is where things get tricky, of course, because in the actual world, we don’t have the ground truth. So if I ask you a counterfactual question of the form, if NATO hadn’t expanded eastward as far as it did in 2004 under the Baltics, NATO-Russia relations would be considerably friendlier than they are now.

•••• ••• ••••

TETLOCK: But you can measure what people’s beliefs are in the counterfactual. And then you can measure people’s beliefs about conditional forecasts that are logically connected to the counterfactuals in a kind of a Bayesean entrance network. If I know that you think that the Russians would be every bit as nasty and snarly, even if we hadn’t moved into the Baltic — they might have even been nastier — it’s probably a fair bet that you’re also likely to think it’s a good idea to increase arm sales to the Ukraine, ratchet up sanctions on Putin’s cronies, and so forth.

So we can identify the counterfactual belief correlates to more or less accurate conditional forecasting. And in that sense, you can indirectly validate or invalidate, you can render more or less plausible certain counterfactual beliefs. That’s the longer-term objective of this research program. We’re not playing Civilization V for the sake of getting better at Civilization V. The ultimate goal is to link the sophistication of counterfactual reasoning about the past to the subtlety in the accuracy of conditional forecast going into the future.

Another thing you should observe by the way — if people are becoming better counterfactual reasoners, you should observe less ideological polarization in their counterfactual beliefs. The counterfactual belief should become as ideologically depolarized as conditional forecasts are.

•••• ••• ••••

TETLOCK: Right, the question is, are they doing better because they have better social networks? Or they are doing better, they have better judgment?

•••• ••• ••••

TETLOCK: It ties into accountability. Accountability to conflicting audiences and value pluralism. You have a richer internal dialogue. You learn to balance conflicting perspectives more. You have to be a better perspective taker. Perspective taking’s a very important part of superforecasting, too.

•••• ••• ••••

Amy Gutmann, PIK, Penn Integrates Knowledge kind of program — you don’t see too many of them yet. It runs against the grain. I’m not even sure it’ll survive at Penn beyond Amy. The natural tendency will be for departments to want to claw back the resources.

•••• ••• ••••

TETLOCK: Kind of obvious things, like read a little bit more outside your field. If you’re a liberal, read the Wall Street Journal. If you’re a conservative, read the New York Times. Expose yourself to dissonant points of view. Try to cultivate some interests outside your field, and try to connect them together. I think there’s an optimal distance.

For history, for example — sounds quite different from what I did. I started as an experimental psychologist, and history looks very different, but they can be connected because historical judgment is something that psychologists study to some degree. Psychologists are interested in hindsight and counterfactuals and so forth. So you can link the two.

I think there’s an optimal distance, and when you go foraging as a fox, you probably don’t want to forage way, way far away. You want to forage far enough away that it’ll be stimulating but still possible to reconnect.

•••• ••• ••••

TETLOCK: Gosh, that’s a hard one. Danny Kahneman, I think, coined the term when he was dealing with his various critics over time. My wife, Barb, actually was involved in an adversarial collaboration between the Kahneman camp and the Gigerenzer camp on the conjunction fallacy. It took at least two or three years of her life.

It’s very hard to get people to . . . Well, lots of us, in principle, are Popperians. We believe in stating our beliefs as falsifiable hypotheses. Most of us believe our beliefs are probabilistic, so we’re somewhat Bayesian. But that’s lip service. There’s what we believe. There’s an epistemological formal self-concept, which is kind of noble, falsificationist, and probabilistic. And then there’s how we actually behave when our egos are at stake in particular controversies.

And those things are quite different. I think Danny Kahneman who’s not known as an optimist, proposed it. It sounds like an optimistic idea, but I think he’s not all that optimistic about what it can achieve on close inspection. My efforts at adversarial collaboration have not been all that successful. I’d like to jumpstart a few of them again, but it’s very hard to find the right dance partners.

•••• ••• ••••

COWEN: Let’s try a question from the realm of the everyday and the mundane. If I go around, and I look at Mexican restaurants, I’m very good at predicting which ones have excellent tacos. What are you good at predicting?

TETLOCK: What am I good at predicting? I think I was pretty good at anticipating the fragility of a lot of microsocial science knowledge prior to the replication crisis erupting.

•••• ••• ••••

TETLOCK: Well, I think the next project is the one that I mentioned earlier. It’s linking historical counterfactual reasoning with conditional forecasting. I think it’ll be the second phase of the FOCUS research tournaments. I think that counterfactual reasoning has, for too long, been the last refuge of ideological scoundrels. Insofar as we can improve the standards of evidence of proof in judging counterfactual claims as well as conditional forecasts and linking the two, I think there’s a potential for improving the quality of debates among interested parties.

•••• ••• ••••

TETLOCK: Oh, to be very cautious because we know we’re running against the grain. We’re running against a psychological grain. We’re running against human nature. We’re running against a sociological grain.

COWEN: But see, enlightenment is stable you tell us, right? If it keeps on cumulating and growing, your influence should be enormous given your other presuppositions.

TETLOCK: Well, there’s a lot of cognitive resistance to treating one’s beliefs as falsifiable, probabilistic propositions. People naturally gravitate toward thinking of their beliefs as ego-defining, quasi-sacred possessions. That’s one major obstacle.

Then you have the existing status hierarchies. You have subject matter experts who are entrenched, have influence. Why would they want to participate in exercises in which the best possible outcome is a tie? They reaffirm that they deserve the status that they already have.

So psychological, sociological resistance. It’s interesting, sociologists and economists have different reactions to forecasting tournaments. Sociologists — they react, “Why would anyone be naive enough to think that anyone would want to have a forecasting tournament in their organization?” They’re status disruptive, right?

•••• ••• ••••

source:

https://www.textise.net/showText.aspx?strURL=https://medium.com/conversations-with-tyler/philip-tetlock-tyler-cowen-forecasting-sociology-30401464b6d9

<---------------------------------------------------------------------------->

[personal note]

How I got here:

youtube.com

one of youtube auto recommend algorithm (they are no longer label as "recommend") the youtube video just show up on your queue (feed).

([ obviously it was some thing I watched or a sequence of things I watched on you tube that cause the recommendation algorithm to put the video in my viewing queue; unless there was human intervention, which is unlikely; the algorithm is made (program and code) by humans ])

so I watched Eric Schmidt being interviewed by Reid Hoffman, and one of the thing that was mentioned was the book, How Google Works, by Eric Schmidt

I got the book How Google Works from the local library; fortunately, the book has been out long enough that the library got a copy and no one is using it.

I read it and notice the name, Philip Tetlock; somehow the name feels familiar.

I bing search Philip Tetlock.

Philip E. Tetlock on forecasting and foraging as a fox (Ep. 93)

Apr 22, 2020

https://medium.com/conversations-with-tyler/philip-tetlock-tyler-cowen-forecasting-sociology-30401464b6d9

<---------------------------------------------------------------------------->

[skip]

In a 1985 essay, Tetlock proposed that accountability is a key concept for linking the individual levels of analysis to the social-system levels of analysis.[12] Accountability binds people to collectivities by specifying who must answer to whom, for what, and under what ground rules.[12][13] In his earlier work in this area, he showed that some forms of accountability can make humans more thoughtful and constructively self-critical (reducing the likelihood of biases or errors), whereas other forms of accountability can make us more rigid and defensive (mobilizing mental effort to defend previous positions and to criticize critics).[14] In a 2009 essay, Tetlock argues that much is still unknown about how psychologically deep the effects of accountability run—for instance, whether it is or is not possible to check automatic or implicit association-based biases,[15] a topic with legal implications for companies in employment discrimination class actions.[16]

In addition to his work on the bias-attenuating versus bias-amplifying effects of accountability, Tetlock has explored the political dimensions of accountability. When, for instance, do liberals and conservatives diverge in the preferences for "process accountability" that holds people responsible for respecting rules versus "outcome accountability" that holds people accountable for bottom-line results?[17][18]

Taboo cognition and sacred values

Rather, humans prefer to believe that they have sacred values that provide firm foundations for their moral-political opinions. People can become very punitive "intuitive prosecutors" when they feel sacred values have been seriously violated, going well beyond the range of socially acceptable forms of punishment when given chances to do so covertly.[28]

Experimental political philosophy

Tetlock argues it is virtually impossible to disentangle the factual assumptions that people are making about human beings from the value judgments people are making about end-state goals, such as equality and efficiency.[43][44][45][46][47] Hypothetical society studies make it possible for social scientists to disentangle these otherwise hopelessly confounded influences on public policy preferences.

source:

https://en.wikipedia.org/wiki/Philip_E._Tetlock

<---------------------------------------------------------------------------->

predict foretell forecast reading room

Friday, December 10, 2021

Philip Tetlock

No comments:

Post a Comment

Chin-tang sah

Blog Archive