Toggle Main Menu Toggle Search

Open Access padlockePrints

An Investigation of Methods for Visualising Highly Multivarate Datasets

Lookup NU author(s): Dr Christopher Brunsdon, Professor Alexander Fotheringham, M Charlton

Downloads

Full text for this publication is not currently held within this repository. Alternative links are provided below where available.


Abstract

Although visualisation has become a `hot topic' in the social sciences, the majority of visualisation studies and techniques apply only to one or two dimensional datasets. Relatively little headway has been made into visualising higher dimensional data although, paradoxically, most social science datasets are highly multivariate. Investigating multivariate data, whether it be done visually or not, in just one or two dimensions can be highly misleading. Two well-known examples of this are the use of a correlation coefficient instead of a regression parameter as an indicator of the relationship between two variables and the use of scatterplots instead of leverage plots as indicators of relationships. This project has therefore investigated several methods for visualising aspects of higher dimensional (i.e. multivariate) datasets. Although some techniques are quite well-established for this purpose, such as Andrews Plots and Chernov Faces, we have ignored these because of their well-know problems. In the case of Andrews plots the functions used are subjective and the plots become very difficult to read when the number of observations rises beyond 30. In the case of Chernov faces, variables which are attached to certain attributes of the face, for example, the eyes, receive more weight in the subjective determination of `unusual' cases. Instead we have examined the use of four newer techniques for visualising aspects of higher dimensional data sets: projection pursuit; Geographically Weighted Regression; RADVIZ; and Parallel Co-ordinates. In projection pursuit the objective is to project an m-dimensional set of points onto a two-dimensional plane (or a three-dimensional volume) by constrained optimisation. The choice of function to be optimised depends on what aspect of the data are the focus of investigation. The technique therefore offers a great deal of flexibility from identifying clusters of similar cases to identify outliers in multivariate space. A problem with projection pursuit though is that it is difficult to interpret because the projection plots produced are of indices produced by linear combinations of variables which might not have any obvious meaning. The technique of Geographical Weighted Regression usefully allows the visualisation of spatial non-stationarity in regression parameter estimates. The output from the technique consists of maps of the spatial drift in parameter estimates which can be used to investigate spatial variations in relationships or for model development because the maps can indicate the effects of missing variables. Relatively little mention is made of Geographically Weighted Regression here because the authors have developed this technique and have written about it in a number of other sources. The RADVIZ approach essentially involves calculating the resultant vector, for each case, of a series of m forces which are the m variables measured for that case. A plot of the locations of these resultants depicts the similarity in the overall measurements across the cases. It is particularly useful for compositional data, such as percentage shares of votes in elections. One drawback of the technique is that it is possible to get similar looking projections from quite different basic data properties and so the interpretation of RADVIZ needs some caution. Finally, the parallel co-ordinates approach is perhaps the most intuitive of the four techniques we examined in that it is essentially a multidimensional variation on the scatterplot. Instead of two axes though, in parallel co-ordinates you can draw relationships between m axes which are depicted as parallel lines. However, the choice of ordering of the axes is influential to the depiction of relationships within the dataset and care must therefore be taken in selecting a particular ordering and the depiction of the data in parallel co-ordinates can get rather messy when large numbers of cases are involved.


Publication metadata

Author(s): Brunsdon CF, Fotheringham AS, Charlton ME

Editor(s): Unwin D; Fisher P

Series Editor(s): Joint Information Systems Committee / ESRC

Publication type: Book Chapter

Publication status: Published

Book Title: Case Studies of Visualization in the Social Sciences

Year: 1998

Volume: 43

Pages: 55-80

Series Title: Technical Report Series

URL: http://www.agocg.ac.uk/reports/visual/casestud/brunsdon/contents.htm


Share