Monday, July 18, 2016

The HAC Emperor has no Clothes: Part 2

The time-series kernel-HAC literature seems to have forgotten about pre-whitening. But most of the action is in the pre-whitening, as stressed in my earlier post. In time-series contexts, parametric allowance for good-old ARMA-GARCH disturbances (with AIC order selection, say) is likely to be all that's needed, cleaning out whatever conditional-mean and conditional-variance dynamics are operative, after which there's little/no need for anything else. (And although I say "parametric" ARMA/GARCH, it's actually fully non-parametric from a sieve perspective.)

Instead, people focus on kernel-HAC sans prewhitening, and obsess over truncation lag selection. Truncation lag selection is indeed very important when pre-whitening is forgotten, as too short a lag can lead to seriously distorted inference, as emphasized in the brilliant early work of Kiefer-Vogelsang and in important recent work by Lewis, Lazarus, Stock and Watson. But all of that becomes much less important when pre-whitening is successfully implemented.

[Of course spectra need not be rational, so ARMA is just an approximation to a more general Wold representation (and remember, GARCH(1,1) is just an ARMA(1,1) in squares). But is that really a problem? In econometrics don't we feel comfortable with ARMA approximations 99.9 percent of the time? The only econometrically-interesting process I can think of that doesn't admit a finite-ordered ARMA representation is long memory (fractional integration). But that too can be handled parametrically by introducing just one more parameter, moving from ARMA(p,q) to ARFIMA(p,d,q).]

My earlier post linked to the key early work of Den Haan and Levin, which remains unpublished. I am confident that their basic message remains intact. Indeed recent work revisits and amplifies it in important ways; see Kapetanios (2016) and new work in progress by Richard Baillie to be presented at the September 2016 NBER/NSF time-series meeting at Columbia ("Is Robust Inference with OLS Sensible in Time Series Regressions?").

Sunday, July 10, 2016

Contemporaneous, Independent, and Complementary

You've probably been in a situation where you and someone else discovered something "contemporaneously and independently". Despite the initial sinking feeling, I've come to realize that there's usually nothing to worry about. 

First, normal-time science has a certain internal momentum -- it simply must evolve in certain ways -- so people often identify and pluck the low-hanging fruit more-or-less simultaneously. 

Second, and crucially, such incidents are usually not just the same discovery made twice. Rather, although intimately-related, the two contributions usually differ in subtle but important ways, rendering them complements, not substitutes.

Here's a good recent example in financial econometrics, working out asymptotics for high-frequency high-dimensional factor models. On the one hand, consider Pelger, and on the other hand consider Ait-Sahalia and Xiu.  There's plenty of room in the world for both, and the whole is even greater than the sum of the (individually-impressive) parts.

Sunday, July 3, 2016

DAG Software

Some time ago I mentioned the DAG (directed acyclical graph) primer by Judea Pearl et al.  As noted in Pearl's recent blog post, a manual will be available with software solutions based on a DAGitty R package.  See

More generally -- that is, quite apart from the Pearl et al. primer -- check out DAGity at  Click on "launch" and play around for a few minutes. Very cool. 

Sunday, June 26, 2016

Regularization for Long Memory

Two earlier regularization posts focused on panel data and generic time series contexts. Now consider a specific time-series context: long memory. For exposition consider the simplest case of a pure long memory DGP,  \( (1-L)^d y_t = \varepsilon_t \) with  \( |d| < 1/2  \).  This \( ARFIMA(0,d,0) \) process is  is \( AR(\infty) \) with very slowly decaying coefficients due to the long memory. If you KNEW the world was was \(ARFIMA(0,d,0)\) you'd just fit \(d\) using GPH or Whittle or whatever, but you're not sure, so you'd like to stay flexible and fit a very long \(AR\) (an \(AR(100) \), say). But such a profligate parameterization is infeasible or at least very wasteful. A solution is to fit the \(AR(100) \) but regularize by estimating with ridge or a LASSO variant, say.

Related, recall the Corsi "HAR" approximation to long memory. It's just a long autoregression subject to coefficient restrictions. So you could do a LASSO estimation, as in Audrino and Knaus (2013). Related analysis and references are in a Humboldt University 2015 master's thesis.)

Finally, note that in all of the above it might be desirable to change the LASSO centering point for shrinage/selection to match the long-memory restriction. (In standard LASSO it's just 0.)

Wednesday, June 22, 2016

Observed Info vs. Estimated Expected Info

All told, after decades of research, it seems that Efron-Hinkley holds up -- observed information dominates estimated expected information MLE standard errors. It's both easier to calculate and more accurate. Let me know if you disagree.

[Efron, B. and Hinkley, D.V. (1978), "Assessing the Accuracy of the Maximum Likelihood Estimator: Observed Versus Expected Fisher Information", Biometrika, 65, 457–487.]

Tuesday, June 21, 2016

Mixed-Frequency High-Dimensional Time Series

Notice that high dimensions and mixed frequencies go together in time series. (If you're looking at a huge number of series, it's highly unlikely that all will be measured at the same frequency, unless you arbitrarily exclude all frequencies but one.) So high-dim MIDAS vector autoregression (VAR) will play a big role moving forward. The MIDAS literature is starting to go multivariate, with MIDAS VAR's appearing; see Ghysels (2015, in press) and Mikosch and Neuwirth (2016 w.p.)

But the multivariate MIDAS literature is still low-dim rather than high-dim. Next steps will be: 

(1) move to high-dim VAR estimation by using regularization methods (e.g. LASSO variants), 

(2) allow for many observational frequencies (five or six, say), 

(3) allow for the "rough edges" that will invariably arise at the beginning and end of the sample, and 

(4) visualize results using network graphics.

Conditional Dependence and Partial Correlation

In the multivariate normal case, conditional independence is the same as zero partial correlation.  (See below.) That makes a lot of things a lot simpler.  In particular, determining ordering in a DAG is just a matter of assessing partial correlations. Of course in many applications normality may not hold, but still...

Aust. N.Z. J. Stat. 46(4), 2004, 657–664
Kunihiro Baba1∗, Ritei Shibata1 and Masaaki Sibuya2
Keio University and Takachiho University
This paper investigates the roles of partial correlation and conditional correlation as mea-sures of the conditional independence of two random variables. It first establishes a suffi-cientconditionforthecoincidenceofthepartialcorrelationwiththeconditionalcorrelation. The condition is satisfied not only for multivariate normal but also for elliptical, multi-variate hypergeometric, multivariate negative hypergeometric, multinomial and Dirichlet distributions. Such families of distributions are characterized by a semigroup property as a parametric family of distributions. A necessary and sufficient condition for the coinci-dence of the partial covariance with the conditional covariance is also derived. However, a known family of multivariate distributions which satisfies this condition cannot be found, except for the multivariate normal. The paper also shows that conditional independence has no close ties with zero partial correlation except in the case of the multivariate normal distribution; it has rather close ties to the zero conditional correlation. It shows that the equivalence between zero conditional covariance and conditional independence for normal variables is retained by any monotone transformation of each variable. The results suggest that care must be taken when using such correlations as measures of conditional indepen-dence unless the joint distribution is known to be normal. Otherwise a new concept of conditional independence may need to be introduced in place of conditional independence through zero conditional correlation or other statistics.
Keywords: elliptical distribution; exchangeability; graphical modelling; monotone transformation.

Saturday, June 18, 2016

A Little Bit More on Dave Backus

In the days since his passing, lots of wonderful things have been said about Dave Backus. (See, for example, the obituary by Tom Cooley, posted on David Levine's page.) They're all true. But none sufficiently stress what was for me his essence: complete selflessness. We've all had a few good colleagues, even great colleagues, but Dave took it to an entirely different level.

The "Teaching" section of his web page begins, "I have an open-source attitude toward teaching materials". Dave had an open-source attitude toward everything. He lived for team building, cross-fertilization, mentoring, and on and on. A lesser person would have traded the selflessness for a longer c.v., but not Dave. And we're all better off for it.

SoFiE 2016 Hong Kong (and 2017 New York)

Hats off to all those who helped make the Hong Kong SoFiE meeting such a success. Special thanks (in alphabetical order) to Charlotte Chen, Yin-Wong Cheung, Jianqing Fan, Eric Ghysels, Ravi Jagannathan, Yingying Li, Daniel Preve, and Giorgio Valente. The conference web site is here

Mark your calendars now for what promises to be a very special tenth-anniversary meeting next year in New York, hosted by Rob Engle at NYU's Stern School. The dates are June 20-23, 2017.

Tuesday, June 14, 2016

Indicator Saturation Estimation

In an earlier post, "Fixed Effects Without Panel Data",  I argued that you could allow for (and indeed estimate) fixed effects in pure cross sections (i.e., no need for panel data) by using regularization estimators like LASSO. The idea is to fit a profligately-parameterized model but then to recover d.f. by regularization.

Note that you can use the same idea in time-series contexts.  Even in a pure time series, you can allow for period-by-period time effects, broken polynomial trend with an arbitrary number of breakpoints, etc., via regularization.  It turns out that a fascinating small literature on so-called "indicator saturation estimation" pursues this idea.  The "indicators" are things like period-by-period time dummies, break-date location dummies, etc., and "saturation" refers to the profligate parameterization.  Prominent contributors include David Hendry and Soren Johanssen; see this new paper and those that it cites.  (Very cool application, by the way, to detecting historical volcanic eruptions.)