Toggle Main Menu Toggle Search

Open Access padlockePrints

Rapid Multivariate Analysis of 3D ToF-SIMS Data: Graphical Processor Units (GPUs) and Low Discrepancy Subsampling for Large-Scale Principal Component Analysis

Lookup NU author(s): Professor Peter Cumpson, Professor Ian Fletcher, Dr Naoko Sano, Dr Anders Barlow



This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).


Principal component analysis (PCA) and other multivariate analysis methods have been used increasingly to analyse and understand depth profiles in XPS, AES and SIMS. For large images or 3D imaging depth-profiles PCA has been difficult to apply until now simply because of the size of the matrices of data involved. In a recent paper we described two algorithms, RV1 and RV2 that improve the speed of PCA and allow datasets of unlimited size respectively. In this paper we now apply the RV2 algorithm to perform PCA on full 3D ToF-SIMS data for the first time without subsampling. The dataset we process in this way is a 128x128 pixel depth profile of 120 layers, each voxel having a 70,439 value mass spectrum associated with it. This forms over a terabyte of data when uncompressed, and took 27 hours to process using the RV2 algorithm using a conventional windows desktop PC. While full PCA (e.g. using RV2) is to be preferred for final reports or publications, a much more rapid method is needed during analysis sessions to inform decisions on the next analytical step. We have therefore implemented the RV1 algorithm on a PC having a Graphical Processor Unit (GPU) card containing 2,880 individual processor cores. This increases the speed of calculation by a factor of around 4.1 compared to what is possible using a fast commercially-available desktop PC having CPUs alone, and full PCA is performed in less than 7 seconds. The size of the dataset that can be processed in this way is limited by the size of the memory on the GPU card. This is typically sufficient for 2D images but not 3D depth-profiles without sampling. We have therefore examined efficient sampling schemes that allow a good approximate solution to the PCA problem for large 3D datasets. We find that low-discrepancy series (LDS) such as Sobol series sampling gives more rapid convergence than random sampling, and we recommend such methods for routine use. Using the GPU and LDS together we anticipate that any ToF-SIMS dataset, of whatever size, can be efficiently and accurately processed into PCA components in a maximum of around 10 seconds using a commercial PC with a widely-available GPU card, though the longer RV2 approach is still to be preferred for the presentation of final results, such as in published papers.

Publication metadata

Author(s): Cumpson PJ, Fletcher IW, Sano N, Barlow AJ

Publication type: Article

Publication status: Published

Journal: Surface and Interface Analysis

Year: 2016

Volume: 48

Issue: 12

Pages: 1328-1336

Print publication date: 01/12/2016

Online publication date: 05/05/2016

Acceptance date: 19/04/2016

Date deposited: 19/04/2016

ISSN (print): 0142-2421

ISSN (electronic): 1096-9918

Publisher: John Wiley & Sons Ltd


DOI: 10.1002/sia.6042


Altmetrics provided by Altmetric


Find at Newcastle University icon    Link to this publication