Journal cover Journal topic
Atmospheric Measurement Techniques An interactive open-access journal of the European Geosciences Union
Journal topic

Journal metrics

Journal metrics

  • IF value: 3.400 IF 3.400
  • IF 5-year value: 3.841 IF 5-year
    3.841
  • CiteScore value: 3.71 CiteScore
    3.71
  • SNIP value: 1.472 SNIP 1.472
  • IPP value: 3.57 IPP 3.57
  • SJR value: 1.770 SJR 1.770
  • Scimago H <br class='hide-on-tablet hide-on-mobile'>index value: 70 Scimago H
    index 70
  • h5-index value: 49 h5-index 49
Discussion papers
https://doi.org/10.5194/amt-2019-404
© Author(s) 2019. This work is distributed under
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/amt-2019-404
© Author(s) 2019. This work is distributed under
the Creative Commons Attribution 4.0 License.

Submitted as: research article 15 Nov 2019

Submitted as: research article | 15 Nov 2019

Review status
This discussion paper is a preprint. It is a manuscript under review for the journal Atmospheric Measurement Techniques (AMT).

Comparison of dimension reduction techniques in the analysis of mass spectrometry data

Sini Isokääntä, Eetu Kari, Angela Buchholz, Liqing Hao, Siegfried Schobesberger, Annele Virtanen, and Santtu Mikkonen Sini Isokääntä et al.
  • Department of Applied Physics, University of Eastern Finland, Kuopio, 70210, Finland

Abstract. Online analysis with mass spectrometers produces complex data sets, consisting of mass spectra with a large number of chemical compounds (ions). Statistical dimension reduction techniques (SDRTs) are able to condense complex data sets into a more compact form while preserving the information included in the original observations. The general principle of these techniques is to investigate the underlying dependencies of the measured variables, by combining variables with similar characteristics to distinct groups, called factors or components. Currently, positive matrix factorization (PMF) is the most commonly exploited SDRT across a range of atmospheric studies, in particular for source apportionment. In this study, we used 5 different SDRTs in analysing mass spectral data from complex gas- and particle phase measurements during laboratory experiment investigating the interactions of gasoline car exhaust and α-pinene. Specifically, we used four factor analysis techniques: principal component analysis (PCA), positive matrix factorization (PMF), exploratory factor analysis (EFA), and non-negative matrix factorization (NMF), as well as one clustering technique, partitioning around medoids (PAM).

All SDRTs were able to resolve 4–5 factors from the gas phase measurements, including an α-pinene precursor factor, 2–3 oxidation product factors and a background/car exhaust precursor factor. NMF and PMF provided an additional oxidation product factor, which was not found by other SDRTs. The results from EFA and PCA were similar after applying oblique rotations. For the particle phase measurements, four factors were discovered with NMF and PMF: one primary factor, a mixed LVOOA factor, and two α-pinene SOA derived factors. PAM was not able to resolve interpretable clusters due to general limitations of clustering methods, as the high degree of fragmentation taking place in the AMS causes different compounds formed at different stages in the experiment to be detected at the same variable. However, when preliminary analysis is needed, or isomers and mixed sources are not expected, cluster analysis may be a useful tool as the results are simpler and thus easier to interpret. In the factor analysis techniques, any single ion generally contributes to multiple factors, although EFA and PCA try to minimize this spread.

Our analysis shows that different SDRTs put emphasis on different parts of the data, and with only one technique some interesting data properties may still stay undiscovered. Thus, validation of the acquired results either by comparing between different SDRTs or applying one technique multiple times (e.g. by resampling the data or giving different starting values for iterative algorithms) is important as it may protect the user from dismissing unexpected results as unphysical.

Sini Isokääntä et al.
Interactive discussion
Status: open (until 10 Jan 2020)
Status: open (until 10 Jan 2020)
AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment
[Subscribe to comment alert] Printer-friendly Version - Printer-friendly version Supplement - Supplement
Sini Isokääntä et al.
Sini Isokääntä et al.
Viewed  
Total article views: 171 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
124 44 3 171 13 2 1
  • HTML: 124
  • PDF: 44
  • XML: 3
  • Total: 171
  • Supplement: 13
  • BibTeX: 2
  • EndNote: 1
Views and downloads (calculated since 15 Nov 2019)
Cumulative views and downloads (calculated since 15 Nov 2019)
Viewed (geographical distribution)  
Total article views: 139 (including HTML, PDF, and XML) Thereof 137 with geography defined and 2 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Cited  
Saved  
No saved metrics found.
Discussed  
No discussed metrics found.
Latest update: 13 Dec 2019
Publications Copernicus
Download
Short summary
Online mass spectrometry produces large amounts of data. These data can be interpreted with statistical methods, enabling scientists easier understanding of the underlying processes. We compared these techniques on car exhaust measurements. We show differences and similarities between the methods and give recommendations on applicability of the methods on certain types of data. We show that applying multiple methods lead to more robust results, thus increasing reliability of the findings.
Online mass spectrometry produces large amounts of data. These data can be interpreted with...
Citation