Monday, December 15, 2014

Causal Modeling Update

In an earlier post on predictive modeling and causal inference, I mentioned my summer "reading list" for causal modeling:
Re-read Pearl, and read the Heckman-Pinto critique.
Re-read White et al. on settable systems and testing conditional independence.
Read Angrist-Pischke.
Read Wolpin, and Rust's review.
Read Dawid 2007 and Dawid 2014.
Get and read Imbens-Rubin (available early 2015).
I'm sure I'll blog on some of the above in due course. Meanwhile I want to add something to the list: Pearl's response to Heckman-Pinto. (I just learned of it.)

One of its themes resonates with me: Econometrics needs to examine more thoroughly the statistical causal modeling literature vis-à-vis standard econometric approaches. (The irreplaceable Hal White started building some magnificent bridges, but alas, he was taken from us much too soon.) Reasonable people will have sharply different views as to what's of value there, and the discussion is far from over, but I'm grateful to the likes of Pearl, White, Heckman, and Pinto for starting it.

Monday, December 8, 2014

A Tennis Match Graphic

I know you're not thinking about tennis in December (at least those of you north of the equator). I'm generally not either. But this post is really about graphics, and I may have something that will interest you. And remember, the Australian Open and the 2015 season will soon be here.

Tennis scoring is different and tricky compared to other sports. A 2008 New York Times piece, "In Tennis, the Numbers Sometimes Don't Add Up," is apt:
If you were told that in a particular match, Player A won more points and more games and had a higher first-serve percentage, fewer unforced errors and a higher winning percentage at the net, you would deduce that Player A was the winner. But leaping to that conclusion would be a mistake. ... In tennis, it is not the events that constitute a match, but the timing of those events. In team sports like baseball, basketball and football, and even in boxing, the competitor who scores first or last may have little bearing on the outcome. In tennis, the player who scores last is always the winner.
Tricky tennis scoring makes for tricky match summarization, whether graphically or otherwise. Not that people haven't tried, with all sorts of devices in use. See, for example, another good 2014 New York Times piece, "How to Keep Score: However You Like," and the fascinating GameSetMap.com, "A blog devoted to maps about tennis," emphasizing spatial aspects but going farther on occasion.

Glenn Rudebusch and I have been working on a graphic for tennis match summarization. We have a great team of Penn undergraduate research assistants, including Bas Bergmans, Joonyup Park, Hong Teoh, and Han Tian. We don't want a graphic that keeps score per se, or a graphic that emphasizes spatial aspects. Rather, we simply want a graphic that summarizes a match's evolution, drama, and outcome. We want it to convey a wealth of information, instantaneously and intuitively, yet also to repay longer study. Hopefully we're getting close.

Here's an example, for the classic Federer-Monfils 2014 U.S. Open match. I'm not going to explain it, because it should be self-explanatory -- if it's not, we're off track. (But of course see the notes below the graph. Sorry if they're hard to read; we had to reduce the graphic to fit the blog layout.)



Does it resonate with you? How to improve it? This is version 1; we hope to post a version 2, already in the works, during the Australian open in early 2015. Again, interim suggestions are most welcome.

Monday, December 1, 2014

Quantum Computing and Annealing



My head is spinning. The quantum computing thing is really happening. Or not. Or most likely it's happening in small but significant part and continuing to advance slowly but surely. The Slate piece from last May still seems about right (but read on). 

Optimization by simulated annealing cum quantum computing is amazing. It turns out that the large and important class of problems that map into global optimization by simulated annealing is marvelously well-suited to quantum computing, so much so that the D-Wave machine is explicitly and almost exclusively designed for solving "quantum annealing" problems. We're used to doing simulated annealing on deterministic "classical" computers, where the simulation is done in software, and it's fake (done with deterministic pseudo-random deviates). In quantum annealing the randomization is in the hardware, and it's real

From the D-Wave site:
Quantum computing uses an entirely different approach than classical computing. A useful analogy is to think of a landscape with mountains and valleys. Solving optimization problems can be thought of as trying to find the lowest point on this landscape. Every possible solution is mapped to coordinates on the landscape, and the altitude of the landscape is the “energy’” or “cost” of the solution at that point. The aim is to find the lowest point on the map and read the coordinates, as this gives the lowest energy, or optimal solution to the problem. Classical computers running classical algorithms can only "walk over this landscape". Quantum computers can tunnel through the landscape making it faster to find the lowest point.
Remember the old days of "math co-processors"? Soon you may have a "quantum co-processor" for those really tough optimization problems! And you thought you were cool if you had a GPU or two.

Except that your quantum co-processor may not work. Or it may not work well. Or at any rate today's version (the D-Wave machine; never mind that it occupies a large room) may not work, or work well. And it's annoyingly hard to tell. In any event, even if it works, the workings are subtle and still poorly understood -- the D-Wave tunneling description above is not only simplistic, but also potentially incorrect.

Here's the latest, an abstract of a lecture to be given at Penn on 4 December 2014 by one of the world's leading quantum computing researchers, Umesh Vazirani of UC Berkeley, titled "How 'Quantum' is the D-Wave Machine?":
A special purpose "quantum computer" manufactured by the Canadian company D-Wave has led to intense excitement in the mainstream media (including a Time magazine cover dubbing it "the infinity machine") and the computer industry, and a lively debate in the academic community. Scientifically it leads to the interesting question of whether it is possible to obtain quantum effects on a large scale with qubits that are not individually well protected from decoherence.

We propose a simple and natural classical model for the D-Wave  machine - replacing their superconducting qubits with classical magnets, coupled with nearest neighbor interactions whose strength is taken from D-Wave's specifications. The behavior of this classical model agrees remarkably well with posted experimental data about the input-output behavior of the D-Wave machine.

Further investigation of our classical model shows that despite its simplicity, it exhibits novel algorithmic properties. Its behavior is fundamentally different from that of its close cousin, classical heuristic simulated annealing. In particular, the major motivation behind the D-Wave machine was the hope that it would tunnel through local minima in the energy landscape, minima that simulated annealing got stuck in. The reproduction of D-Wave's behavior by our classical model demonstrates that tunneling on a large scale may be a more subtle phenomenon than was previously understood...
Wow. I'm there.

All this raises the issue of how to test untrusted quantum devices, which brings us to the very latest, Vizrani's second lecture on 5 December, "Science in the Limit of Exponential Complexity." Here's the abstract:
One of the remarkable discoveries of the last quarter century is that quantum many-body systems violate the extended Church-Turing thesis and exhibit exponential complexity -- describing the state of such a system of even a few hundred particles would require a classical memory larger than the size of the Universe. This means that a test of quantum mechanics in the limit of high complexity would require scientists to experimentally study systems that are inherently exponentially more powerful than human thought itself! 
A little reflection shows that the standard scientific method of "predict and verify" is no longer viable in this regime, since a calculation of the theory's prediction is rendered computationally infeasible. Does this mean that it is impossible to do science in this regime? A remarkable connection with the theory of interactive proof systems (the crown jewel of computational complexity theory) suggests a potential way around this impasse: interactive experiments. Rather than carry out a single experiment, the experimentalist performs a sequence of experiments, and rather than predicting the outcome of each experiment, the experimentalist checks a posteriori that the outcomes of the experiments are consistent with each other and the theory to be tested. Whether this approach will formally work is intimately related to the power of a certain kind of interactive proof system; a question that is currently wide open. Two natural variants of this question have been recently answered in the affirmative, and have resulted in a breakthrough in the closely related area of testing untrusted quantum devices. 
Wow. Now my head is really spinning.  I'm there too, for sure.

Monday, November 24, 2014

More on Big Data

An earlier post, "Big Data the Big Hassle," waxed negative. So let me now give credit where credit is due.

What's true in time-series econometrics is that it's very hard to list the third-most-important, or even second-most-important, contribution of Big Data. Which makes all the more remarkable the mind-boggling -- I mean completely off-the-charts -- success of the first-most-important contribution: volatility estimation from high-frequency trading data. Yacine Ait-Sahalia and Jean Jacod give a masterful overview in their new book, High-Frequency Financial Econometrics.

What do financial econometricians learn from high-frequency data? Although largely uninformative for some purposes (e.g., trend estimation), high-frequency data are highly informative for others (volatility estimation), an insight that traces at least to Merton's early work. Roughly put: as we sample returns arbitrarily finely, we can infer underlying volatility arbitrarily well. Accurate volatility estimation and forecasting, in turn, are crucial for financial risk management, asset pricing, and portfolio allocation. And it's all facilitated by by the trade-by-trade data captured in modern electronic markets.

In stressing "high frequency" financial data, I have thus far implicitly stressed only the massive time-series dimension, with its now nearly-continuous record. But of course we're ultimately concerned with covariance matrices, not just scalar variances, for tens of thousands of assets, so the cross-section dimension is huge as well. (A new term: "Big Big Data"? No, please, no.) Indeed multivariate now defines significant parts of both the theoretical and applied research frontiers; see Andersen et al. (2013).

Monday, November 17, 2014

Quantitative Tools for Macro Policy Analysis

Penn's First Annual PIER Workshop on Quantitative Tools for Macroeconomic Policy Analysis will take place in May 2015.  The poster appears below (and here if the one below is a bit too small), and the website is here. We are interested in contacting anyone who might benefit from attending. Research staff at central banks and related organizations are an obvious focal point, but all are welcome. Please help spread the word, and of course, please consider attending. We hope to see you there!


Monday, November 10, 2014

Penn Econometrics Reading Group Materials Online

Locals who come to the Friday research/reading group will obviously be interested in this post, but others may also be interested in following and influencing the group's path.

The schedule has been online here for a while. Starting now, it will contain not only paper titles but also links to papers when available. (Five are there now.) We'll leave the titles and papers up, instead of deleting them as was the earlier custom. We'll also try to post presenters' slides as we move forward.

Don't hesitate to suggest new papers that would be good for the Group.

Sunday, November 2, 2014

A Tribute to Lawrence R. Klein


(Remarks given at the Klein Legacy Dinner, October 24, 2014, Lower Egyptian Gallery, University of Pennsylvania Museum of Archaeology and Anthropology.)

I owe an immense debt of gratitude to Larry Klein, who helped guide, support and inspire my career for more than three decades. Let me offer just a few vignettes.

Circa 1979 I was an undergraduate studying finance and economics at Penn's Wharton School, where I had my first economics job. I was as a research assistant at Larry's firm, Wharton Econometric Forecasting Associates (WEFA). I didn't know Larry at the time; I got the job via a professor whose course I had taken, who was a friend of a friend of Larry's. I worked for a year or so, perhaps ten or fifteen hours per week, on regional electricity demand modeling and forecasting. Down the hall were the U.S. quarterly and annual modeling groups, where I eventually moved and spent another year. Lots of fascinating people roamed the maze of cubicles, from eccentric genius-at-large Mike McCarthy, to Larry and Sonia Klein themselves, widely revered within WEFA as god and goddess. During fall of 1980 I took Larry's Wharton graduate macro-econometrics course and got to know him. He won the Nobel Prize that semester, on a class day, resulting in a classroom filled with television cameras. What a heady mix!

I stayed at Penn for graduate studies, moving in 1981 from Wharton to Arts and Sciences, home of the Department of Economics and Larry Klein. I have no doubt that my decision to stay at Penn, and to move to the Economics Department, was heavily dependent on Larry's presence there. During the summer following my first year of the Ph.D. program, I worked on a variety of country models for Project LINK, under the supervision Larry and another leading modeler in the Klein tradition, Peter Pauly.  It turned out that the LINK summer job pushed me over the annual salary cap for a graduate student -- $6000 or so 1982 dollars, if I remember correctly -- so Larry and Peter paid me the balance in kind, taking me to the Project LINK annual meeting in Wiesbaden, Germany. More excitement, and also my first trip abroad.

Both Larry and Peter helped supervise my 1986 Penn Ph.D. dissertation, on ARCH modeling of asset return volatility. I couldn't imagine a better trio of advisors: Marc Nerlove as main advisor, with committee members Larry and Peter (who introduced me to ARCH). I took a job at the Federal Reserve Board, with the Special Studies Section led by Peter Tinsley, a pioneer in optimal control of macro-econometric models. Circa 1986 Larry had more Ph.D. students at the Board than anyone else, by a wide margin. Surely that helped me land the Special Studies job. Another Klein student, Glenn Rudebusch, also went from Penn to the Board that year, and we wound up co-authoring a dozen articles and two books over nearly thirty years. My work and lasting friendship with Glenn trace in significant part to our melding in the Klein crucible.

I returned to Penn in 1989 as an assistant professor. Although I have no behind-the-scenes knowledge, it's hard to imagine that Larry's input didn't contribute to my invitation to return. Those early years were memorable for many things, including econometric socializing. During the 1990's my wife Susan and I had lots of parties at our home for faculty and students. The Kleins were often part of the group, as were Bob and Anita Summers, Herb and Helene Levine, Bobby and Julie Mariano, Jere Behrman and Barbara Ventresco, Jerry Adams, and many more. I recall a big party on one of Penn's annual Economics Days, which that year celebrated The Keynesian Revolution, Larry's landmark 1947 monograph.

The story continues, but I'll mention just one more thing. I was honored and humbled to deliver the Lawrence R. Klein Lecture at the 2005 Project LINK annual meeting in Mexico City, some 25 years after Larry invited a green 22-year-old to observe the 1982 meeting in Wiesbaden.

I have stressed guidance and support, but in closing let me not forget inspiration, which Larry also provided for three decades, in spades. He was the penultimate scholar, focused and steady, and the penultimate gentleman, remarkably gracious under pressure.

A key point, of course, is that it's not about what Larry provided me, whether guidance, support or inspiration -- I'm just one member of this large group. Larry generously provided for all of us, and for thousands of others who couldn't be here tonight, enriching all our lives. Thanks Larry. We look forward to working daily to honor and advance your legacy.

---

(For more, see the materials here.)

Tuesday, October 21, 2014

Rant: Academic "Letterhead" Requirements

(All rants, including this one, are here.)

Countless times, from me to Chair/Dean xxx at Some Other University: 

I am happy to help with your evaluation of Professor zzz. This email will serve as my letter. [email here]...
Countless times, from Chair/Dean xxx to me: 
Thanks very much for your thoughtful evaluation. Can you please put it on your university letterhead and re-send?
Fantasy response from me to Chair/Dean xxx:
Sure, no problem at all. My time is completely worthless, so I'm happy to oblige, despite the fact that email conveys precisely the same information and is every bit as legally binding (whatever that even means in this context) as a "signed" "letter" on "letterhead." So now I’ll copy my email, try to find some dusty old Word doc letterhead on my hard drive, paste the email into the Word doc, try to beat it into submission depending on how poor the formatting / font / color / blocking looks when first pasted, print from Word to pdf, attach the pdf to a new email, and re-send it to you. How 1990’s.
Actually last week I did send something approximating the fantasy email to a dean at a leading institution. I suspect that he didn't find it amusing. (I never heard back.) But as I also said at the end of that email,
"Please don’t be annoyed. I...know that these sorts of 'requirements' have nothing to do with you per se. Instead I’m just trying to push us both forward in our joint battle with red tape."

Monday, October 13, 2014

Lawrence R. Klein Legacy Colloquium


In Memoriam


The Department of Economics of the University of Pennsylvania, with kind support from the School of Arts and Sciences, the Wharton School, PIER and IER, is pleased is pleased to host a colloquium, "The Legacy of Lawrence R. Klein: Macroeconomic Measurement, Theory, Prediction and Policy," on Penn’s campus, Saturday, October 25, 2014. The full program and related information are here. We look forward to honoring Larry’s legacy throughout the day. Please join us if you can.  

Featuring:
  • Olav Bjerkholt, Professor of Economics, University of Oslo
  • Harold L. Cole, Professor of Economics and Editor of International Economic Review, University of Pennsylvania
  • Thomas F. Cooley, Paganelli-Bull Professor of Economics, New York University 
  • Francis X. Diebold, Paul F. Miller, Jr. and E. Warren Shafer Miller Professor of Economics, University of Pennsylvania
  • Jesus Fernandez-Villaverde, Professor of Economics, University of Pennsylvania
  • Dirk Krueger, Professor and Chair of the Department of Economics, University of Pennsylvania
  • Enrique G. Mendoza, Presidential Professor of Economics and Director of Penn Institute for Economic Research, University of Pennsylvania
  • Glenn D. Rudebusch, Executive Vice President and Director of Research, Federal Reserve Bank of San Francisco
  • Frank Schorfheide, Professor of Economics, University of Pennsylvania
  • Christopher A. Sims, John F. Sherrerd ‘52 University Professor of Economics, Princeton University 
  • Ignazio Visco, Governor of the Bank of Italy

Monday, October 6, 2014

Intuition for Prediction Under Bregman Loss

Elements of the Bregman family of loss functions, denoted \(B(y, \hat{y})\), take the form:
$$B(y, \hat{y}) = \phi(y) - \phi(\hat{y}) - \phi'(\hat{y}) (y-\hat{y})
$$ where \(\phi: \mathcal{Y} \rightarrow R\) is any strictly convex function, and \(\mathcal{Y}\) is the support of \(Y\).

Several readers have asked for intuition for equivalence between the predictive optimality of \( E[y|\mathcal{F}]\) and Bregman loss function \(B(y, \hat{y})\).  The simplest answers come from the proof itself, which is straightforward.

First consider \(B(y, \hat{y}) \Rightarrow E[y|\mathcal{F}]\).  The derivative of expected Bregman loss with respect to \(\hat{y}\) is
$$
\frac{\partial}{\partial \hat{y}} E[B(y, \hat{y})] = \frac{\partial}{\partial \hat{y}} \int B(y,\hat{y}) \;f(y|\mathcal{F}) \; dy
$$
$$
=  \int \frac{\partial}{\partial \hat{y}} \left ( \phi(y) - \phi(\hat{y}) - \phi'(\hat{y}) (y-\hat{y}) \right ) \; f(y|\mathcal{F}) \; dy
$$
$$
=  \int (-\phi'(\hat{y}) - \phi''(\hat{y}) (y-\hat{y}) + \phi'(\hat{y})) \; f(y|\mathcal{F}) \; dy
$$
$$
= -\phi''(\hat{y}) \left( E[y|\mathcal{F}] - \hat{y} \right).
$$
Hence the first order condition is
$$
-\phi''(\hat{y}) \left(E[y|\mathcal{F}] - \hat{y} \right) = 0,
$$
so the optimal forecast is the conditional mean, \( E[y|\mathcal{F}] \).

Now consider \( E[y|\mathcal{F}] \Rightarrow B(y, \hat{y}) \). It's a simple task of reverse-engineering. We need the f.o.c. to be of the form
$$
const \times \left(E[y|\mathcal{F}] - \hat{y} \right) = 0,
$$
so that the optimal forecast is the conditional mean, \( E[y|\mathcal{F}] \). Inspection reveals that \( B(y, \hat{y}) \) (and only \( B(y, \hat{y}) \)) does the trick.

One might still want more intuition for the optimality of the conditional mean under Bregman loss, despite its asymmetry.  The answer, I conjecture, is that the Bregman family is not asymmetric! At least not for an appropriate definition of asymmetry in the general \(L(y, \hat{y})\) case, which is more complicated and subtle than the \(L(e)\) case.  Asymmetric loss plots like those in Patton (2014), on which I reported last week, are for fixed \(y\) (in Patton's case, \(y=2\) ), whereas for a complete treatment we need to look across all \(y\). More on that soon.

[I would like to thank -- without implicating -- Minchul Shin for helpful discussions.]