Climate Reading Group Meeting Notes 10/4 Here are the main issues that arose in the meeting and a few points that were made regarding each. 1) Assumption of stationarity - Using samples over time, we assume that the EOFs do not vary. Is this a reasonable assumption? - Serge gave an example of analyzing 20th and 21st century output from a GCM with time-dependent forcing. Separate analyses of each century gave similar EOFs. 2) Degeneracy - What do the authors mean when they say that degeneracy can be a good thing? What type of real-world signals have spatially propagating structure? - We weren't sure about this, but we thought that perhaps given a certain, say 2-D, subspace spanned by the eigenvectors, that perhaps one could best represent the signals by choosing the EOFs such that the time coefficients (principal components) were maximally out of phase. - How do we recognize degeneracy in eigenvalues of the empirical covariance matrix? How close do they need to be and how fast do the eigenvalues typically decay in the types of examples we're considering, say sea surface temperatures? - Assuming exchangeability of the rows (but see below), Iain suggested that one could bootstrap. - Jonty says this has not been evident in data he has considered: there are a few large, distinct eigenvalues. 3) Rotation & interpretability - Lenny sent some notes pointing out that orthogonality can be a downside in terms of interpretability, as EOFs focus on variance, not information. - One of the stated advantages of performing a rotation is that the rotated EOFs are less sensitive to the distribution of the observing locations. (Was the reason for this resolved? I don't remember.) - Jonty suggested that it is better to perform a rotation concurrently with PCA, not after choosing some number of factors, as the rotation also gives some dimension reduction. 4) Non-exchangeable rows - The usual statistical tests and asymptotic distributions assume that the rows of the data matrix are exchangeable. Jonty pointed out that this almost never holds in these types of problems. For example, the rows might represent various ensemble members, or they might be realizations of some time-dependent stochastic process. - Iain suggested that if the correlation had product structure, then one could decorrelate the rows first, then perform PCA on this transformed matrix. But it's not clear what to do with more complicated dependence structures. - Jonty will email the group some more thoughts, several of us (Debashis, Cari, anyone else?) will look for anything in the literature addressing this problem. Next time we'll continue talking about strategies when the rows are dependent. We'll also look at the Levine & Berliner and Allen & Tett articles on fingerprinting.