Sociedad de Estadistica e I~vestigacidn Opevatit~a Test, (2002) Vol. 11, No. 1, pp. 143-165
A biplot method for multiva...
9 downloads
376 Views
962KB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Sociedad de Estadistica e I~vestigacidn Opevatit~a Test, (2002) Vol. 11, No. 1, pp. 143-165
A biplot method for multivariate normal populations with unequal covariance matrices Miquel Calvo*, Angel Villarroya and Josep M. Oiler Departarr~en t of S~a.ti,~tic,s, U.r~iversit9 of' F3arcelo.r~a, Spair~.
Abstract S o m e p r e v i o u s idea.s a.bout non-]inea.r b i p ] o t s t o a c h i e v e a j o i n [ representation of m u l t i v a r i a t e n o r m a l p o p u l a t i o n s a.nd a.ny p a r a m e t r i c f u n e L i o n w i t h o u t a s s u m p t i o n s al>out t h e eova.riance m a . t r i e e s a r e e x t e n d e d . Usna.1 r e s t r i c t ; i o n s on t h e c o v a r i a n c e m a t r i c e s ( s u c h a.s h o m o g e n e i t c y ) are a v o i d e d . Va.rial,les a r e r e p r e s e n t e d as c u r v e s c o r r e s p o n d i n g to t h e d i r e c t i o n s of m a x i m u m m e a n s v a r i a t i o n . To d e m o n s t r a t e t h e versa.tility of t h e m e t h o d , t h e repre,sentaJ~ion ot varia.nces a.nd cova.ria.nces as a n exa.mp]e of f m - t h e r [>ossil:)]e i n t e r e s t i n g pa.ra.metric f l m c t i o n s h a v e b e e n d e v e ] o p e d . T h i s m e t h o d is i l l u s t r a . t e d w i t h t w o d i f f e r e n t d a t a s e t s , a n d t h e s e r e s u l t s a r e corn p a r e d w i t h t h o s e o b t a i n e d u s i n g t w o o t h e r d i s t a n c e s for t h e n o r m a l m u l t i v a r i a t e case: t h e M a h a . l a . n o b i s dis[a.nce ( a s s u m i n g ; a c o m m o n eovaria.nce m a t . f i x for all p o p u l a t i o n s ) a.nd R.ao's dista.nce, a s s u m i n g a. c o m m o n e i g e n v e c t o r s t r u c t . u r e for all t h e c o v a r i a n e e in a t r i c e s .
Key Words:
M u l t i v a r i a t e n o r m a l d i s t r i b u t i o n , n o n l i n e a r b i p l o t s , Siegel d i s t a n c e ,
Fl,a.o d ista.nce.
AMS subject classification:
1
62H99,52-07,62-09.
Introduction
The biplot method is a widely used plotting tecimique in applied multivariate data analysis. Biplot enables k muRivariate samples to be plotted together with the set of coordinate axes corresponding to the original variables, projecting the two classes of objects into a low-dimensional Euclidean space. This double representation, usually done in ]t{2, has led to improved data interpretation in applied studies and, sometimes, may be complemented with other analysis based on hierarchical methocks, as in Cap devih and Arcas (1995). T h i s w o r k is s u p p o r t e d 1.999SGR00059.
by
DGICYT
gra~,t
(Spain),
BFM2000-0801
and
also
* C o r r e s p o n d e n c e to: M i q n e l C alvo L l o r c a , I ) e p a . r t a m e n t d ' E s t a . d i s t i c a , U n i v e r s i t a t d e 13arcelona, A v g d a . . D i a g o n a l 645, 0 8 0 2 8 Ba.rcelona., Spa.in. E.ma.il: c a . l v o @ b i o . u b . e s R e c e i v e d : F e b r u a r y 2000;
A c c e p t e d : D e c e m b e r 2001
144
M. Calvo, A. Villarroga ar~,d J.M. Oller
Gower and Harding (1988) generalized the classic biplot m e t h o d by including embeddable metrics in a Euclidean space, and proposed the extension of this idea to any kind of metric. This last technique is known as the ~on-li~eer biplot. In the same paper, Gower and Harding also proposed to extend the biplot to cover more s t r u c t m e d s~mple d~ta; see also Gower (1993) and Cuadras et al. (1997) %r further details and related topics. The ca.~,orzical disc'rirlzi,zant a.rzal#,sis (CDA) is a classic representation method introduced by Rao (1948). It enables the samples from p different populations, each of them associated to a multivariate normal model, to be plotted in a low-dimensional space. The underlying metric is the one induced by the dh[ahalar~.obis disfa~.ce. This implies an irnportant additional assmnption: a colflnlon covariance matrix %r the p populations is required. In most applied situations tiffs hypothesis of homoge~fity of covariance matrices is not satisfied e.g., Fisher's Iris data. The need %r a more general distance %r the multivariate normal model has been raised in several papers. More recently Krzanowski (1996) proposed, the Rao distance for m.ultivariate normal densities, but his technique reqlfires a common structure of the eigenvectors in the covariauce matrices; see also the comprehensive paper of B u r b e a (1986). Unfortunately, it is not possible to extend this result to the full family, i.e., without any condition on their covariance structure, because the explicit form of the Rao distance has not already been obtMned for all cases, see, for instance, Calvo and Oller (1991). In this paper we extend some results previously obtained by Calvo et al. (1998). First, we looked for a graphical representation of multivariate norreal populations, without the assumption, of covariance matrix homogeneity. The rt-variate normal populations 5%,,(tr E) are identified as a symmetric (r~, + 1) • (r~,+ 1) positive defi~fite matrix, and the Mahalanobis distance is then replaced by the Siegel disrepute, see Calvo and Oiler (1990). We have not used the R,ao distance between multivariate normal distributions since, until now, it has not been obtained explicitly, as we pointed out above. Furthermore, three important properties of the Siegel distance are the reasons that we have preferred it over other more usual general distances, such as the Hellinger or B h a t t a c h a r y y a distances. These three properties are: a) the Siegel distance is not upper bounded, as the other two are; 5) it is bzvaric~.t under affine trm~sformations over the random variables and c) it is a quite sharp lower b o u n d of the Rao distance. See Calvo and Oller (1990) and Subsection 2.2 for more details. Once the interdistance population matrix is computed, the samples are represented in a low-dimensional
Biplot for normal nopulations with ~tnequal cova~ance
145
space, following standard Principal Coordinates Analysis (PCA). The newest aspect of our proposed m e t h o d for the non-linear biplot, is how the representation of the variables is obtained. The Siegel distance does not permit to plot the set of axes in a simple way, as is done in standard biplot method (e.g. Cower a n d Harding (1988)). In Section 2 we suggest the use of the gradient of the random variables mean in the Siegel space, where the populations are embedded. By the gradient's integration, a bundle of curves is obtained. Each curve, associated to one of the original variables, provides information on the direction of the maximtma variation of the corresponding mean value. OIme the bundle is computed, the w~riable representation is obtained by using the same projection of the populations, based on P C A . Therefore, we can obtain a set of coordinate axes, analogous to the non-linear biplot axes, choosing a nominated point as the origin of the n variable curves. Tiffs representation, based on the first moment #, can be extended to any smooth function of t~ and ~, and in particular to higher order moments. We illustrate here its potential usefulness by representing the variances and the covariances in this way. The techmcal details are discussed in the following sections.
2 2.1
Representation
of populations
and variables
T h e e m b e d d i n g in t h e Siegel g r o u p
Let us assume that the populations f ~ l , . . . , f~p have associated the nmRivariate normal model N , ( # , G). For each population, its density function is univocally determined by the proper parametric representation. From now on, we represent f~i by (#i, ~i). Let us consider the set of the symmetric positive-definite matrices, P~+I, and the differential metric defined as: d's2
21 t r
r 6
(2.1)
The structure of P,,+~ becomes a R,iemannian manifold usually known as the Siegel group, see Siegel (1964). The nmltivariate normal embedding in tile Siegel group is proved in Calvo and Oller (1990), b u t we prefer to
M. Calvo,A. Villarrogaand J.M. Oller
146
summarise here some results. Any ~P E P~,+I can lye expressed as: .3#t
fl
,
,3 ~ I[{§
t~ 5 R r', E 5 P.~,
/
and the differential metric in (2.1) can also be expressed as:
d ' ~ 2 +fld#t E ' ds2 =21 (\..~4/
dg+~trl {(E
]dE)2}.
(2.2)
The basic ide~ in Calvo and Oller (1990) is to associate each multivariate normal density to a symmetric definite-positive matrix by means of the following map:
(P'~'E~)~f(i~~'E~)~( E~+~t~t~/t~ ) ~~ t 1
"
(2.3)
If O is the parametric space of the multivariate normal model~ the image set f(6)) has an induced metric in P,~+] eqtfivalent to the metric induced in (9 by the Fisher inGrmation matrix, i.e., the d.~2 element has the %rm:
As proved in Calvo and Oiler (1990), .f(O) is a ~on-geodesicsubmanifold of P,+], isometric to @, with the itfformation metric. An imporl~ant derived consequence is t h a t the Siegel distance supplies a lower boundof the Rao distance. A later property of inl~erest is related to affine transformations. If the r a n d o m vector of variables X is transformed by the rule: X --* Q X + q,
with Q E
GL~,
and q E R " ,
the density corresponding to f~} will now be represented by
Because of the im;ariance of the Siegel gToup under changes from it follows immediately t h a t
G.(% %)
(GL~,.), (2.s)
Biplot for 7~,orrn,al r~,opula.tior~,swith ~t7~,eq~talcova~ar~,ce
147
In other words, the Siegel distance remains u~,cha~,ged .u~.der ajfi~te t~ct~.s-
.formations of the variables. The Siegel dista,~ce between the populations f~ and f~j is defined as the Riemannian distance between the two matrices in the Siegel group where the populations are embedded. This distance is given by:
n,+l =
=
) 1/2 log: Z~.
, (2.6)
\ ~:=1 where ItAII - {tr(AA~)} j/2 stands for the matrix norm, a n d Ak ~re ~he eigenvaNes of ~ i j/2 ~.i ~ i 1/z (or also of ~ i I ~.i)' Let us remark again that, from (2.5), the distance (2.6) is invariant under affine changes, in particular, under scale and/or translation changes of the random variables. In applied situations, the parameters (#i, Ei) are unknown, so they are replaced by their m a x i m u m likelihood estimators (~s, S ] ) to represent f~i, giving:
~~
2.2
( '~ + t"~"[ "~ 1
Some relationships tance
between
,
i
1,...,p.
(2.7)
S i e g e l d i s t a n c e a n d R a o dis-
Although a closed form of the Rao distance between two arbitrary multivarial, e normal distributions has not been obtained yet, extending some previous results, see Calvo and Oller (1991), it is possible to obtain explicit expressions for this distance in. certain cases. This fact allows us to compare Rao distance to Siegel distance in these cases. First of all, let us obtain the Rao distance for two points of the form. (#j., E) and (#2, a,E), where c~ ~ IR+. Observe that this case is ~.ot included in Krzanowski (1996), since now we are computing the Rao distance in the whole mamfold of all multivariate normal distributions, instead of the submanifold obtained considering only covariance matrices with the same eigenvectors. Starting from form.ula (14) in Cairo and Oiler (1991), applied to the presenl, case and with the same notation, if we let X be a r~, • r~, matrix and a r~, • 1 vector defsJed by
X
(cosh(Gp/2)- B G - sinh(Gp/2)) cosh(Gp/2) ,
d
E-~/2(#2-f,,j.),
148
M. Calvo, A. Villarroya and J.M. Oller
and (7 a n • r~, symmetric matrix given by
c
(,
,)
(~+)i+2~d~
.
It is possible to express
i T'(o + ~)T where T is an r~, • ~u orthogonal matrix and H is a r~, • r~, skew symmetric matrix. Taking into account Theorem 3.1 of tile above referred paper and that, in the present case
BG-sinh(G/2))
T (cosh(G/2) is an orthogonal matrix, it results that
eosh2(ap/2)
~ r ~ (C + H ) ( C + H)" T 4
and~ since tr(G 2) = 2~ we can express the Rao distance, p~ as
Moreover~ since for any square matrix It