光子学报  2019, Vol. 48 Issue (10): 1030002  DOI: 10.3788/gzxb20194810.1030002
0

引用本文  

郭恺琛, 武中臣, 朱香平, 等. 基于主成分载荷空间距离的LIBS特征谱线选择及矿物元素丰度识别方法研究[J]. 光子学报, 2019, 48(10): 1030002. DOI: 10.3788/gzxb20194810.1030002.
GUO Kai-chen, WU Zhong-chen, ZHU Xiang-ping, et al. Mineral Element Abundance Identification Based on LIBS Emission Line Selection by Loading Space Distance of Principal Component Analysis[J]. Acta Photonica Sinica, 2019, 48(10): 1030002. DOI: 10.3788/gzxb20194810.1030002.

Foundation item

The National Natural Science Foundation of China (No. 41573056), National Key Research and Development Program of China (No.2016YFB0303804)

First author

GUO Kai-chen(1993-), male, M.S.degree candidate, mainly focus on planetary spectroscopy, LIBS.Email:kyle_goo@mail.sdu.edu.cn

Corresponding author

WU Zhong-chen(1976-), male, professor, Ph.D.degree, mainly focuses on planetary spectroscopy, LIBS.Email:z.c.wu@sdu.edu.cn

Article History

Received: May. 9, 2019
Accepted: Jun. 28, 2019
基于主成分载荷空间距离的LIBS特征谱线选择及矿物元素丰度识别方法研究
郭恺琛1,2, 武中臣1,2, 朱香平3, 凌宗成1,2, 张江1,2, 李芸1,2, 钱茂程1,2    
(1 山东大学 空间科学研究院, 山东 威海 264209)
(2 山东大学 山东省光学天文与日地空间环境重点实验室, 山东 威海 264209)
(3 中国科学院西安光学精密机械研究所 瞬态光学与光子技术国家重点实验室, 西安 710119)
摘要:以ChemCam团队公布的64个飞行前定标样品的浓度和激光诱导击穿光谱数据为对象,通过使用主成分分析载荷空间距离法对特定元素分析,筛选出对该元素最敏感的激光诱导击穿光谱谱线,并以此为依据进行矿物元素种类和丰度识别,其识别精度高达92.8%.结果表明,主成分分析载荷空间距离可以作为定量分析前矿物特定元素信息和元素丰度的判断依据.该方法降低了岩石/矿物分类的难度,有利于实现未知的矿物快速、高效的鉴别分析,为火星表面岩石种类鉴别分析提供了一个有效的策略.
关键词行星光谱学    激光诱导击穿光谱技术    主成分分析法    波长选择    矿物分析    等离子体    
中图分类号:TG146.21;P575;O433.5      文献标识码:A      
Mineral Element Abundance Identification Based on LIBS Emission Line Selection by Loading Space Distance of Principal Component Analysis
GUO Kai-chen1,2, WU Zhong-chen1,2, ZHU Xiang-ping3, LING Zong-cheng1,2, ZHANG Jiang1,2, LI Yun1,2, QIAN Mao-cheng1,2    
(1 Institute of Space Scienc, Shandong University, Weihai, Shandong 264209, China)
(2 Shandong Provincial Key Laboratory of Optical Astronomy & Solar Terrestrial Environment, Shandong University, Weihai, Shandong 264209, China)
(3 State Key Laboratory of Transient Optics and Photonics, Xi'an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi'an 710119, China)
Foundation item: The National Natural Science Foundation of China (No. 41573056), National Key Research and Development Program of China (No.2016YFB0303804)
First author: GUO Kai-chen(1993-), male, M.S.degree candidate, mainly focus on planetary spectroscopy, LIBS.Email:kyle_goo@mail.sdu.edu.cn
Corresponding author: WU Zhong-chen(1976-), male, professor, Ph.D.degree, mainly focuses on planetary spectroscopy, LIBS.Email:z.c.wu@sdu.edu.cn
Received: May. 9, 2019; Accepted: Jun. 28, 2019
Abstract: The concentration and laser-induced breakdown spectroscopy data of 64 pre-flight calibration samples, published by the ChemCam team, were used as objects of research. Principal component analysis loading space distance method was used to analyze the target element, the most sensitive laser-induced breakdown spectral line of the target element was selected, and the mineral element species and abundance were identified with the identification accuracy up to 92.8% based on this method. The result shows that principal component analysis loading space distance can be used as a criterion to obtain the critical element information of minerals element abundance before, if aim to serve for, quantitative analysis. This study reduces the difficulty in rock/mineral classification and is beneficial to unknown minerals analysis, which offers an effective identification strategy for the Martian surface rock type analysis.
Key words: Planetary spectroscopy    Laser induced breakdown spectroscopy    Principal component analysis    Feature extraction    Minerals    Plasma    
OCIS Codes: 300.6365;300.0300;240.6490;240.0240;200.4560
0 Introduction

Mars is the most probable interstellar destination of human immigration. Mars exploration sparks human huge interest. For better understanding the evolutionary history of the Martian surface, nearly 43 probes have been sent to Mars since 1960 s, including orbiters, landers and rovers. ChemCam, a payload on the Mars Science Laboratory (MSL) rover Curiosity, is the first Martian exploration Laser Induced Breakdown Spectroscopy (LIBS) system, which can provide the LIBS emission lines, the texture and morphology features of target using a Remote Micro-Imager (RMI)[1]. Compared with X-Ray Fluorescence (XRF), Alpha Particle X-ray Spectrometer (APXS) and other element analysis techniques, LIBS has certain remarkable advantages, for instance, the quick, convenient and simultaneous capability access to multi-element information in a micro-area. So far, ChemCam has been working continuously for six years (~2 400 Sol) and obtained a lot of scientific results along its patrol route (~20 km): low density and light colored igneous rock[2] and hydrated soil[3] were found for the first time; magnesium sulfate, calcium sulfate, ferric sulfate aggregates and fluorite in the sedimentary rocks were found near Pahrump Hills[4]; The found of igneous composition in sedimentary rock revealed the diversity of its source[3]; the signals of fluvial conglomerate[5] were also detected as well as olivine, pyroxene, neutral feldspar, potash feldspar[6] and calcium sulfate[7]. In the short run, around 2020, China, NASA and ESA will respectively perform Mars exploration missions, respectively. China's Mars rover has been scheduled to land on the red planet before July 2021[8]. One stand-off LIBS will be carried by Chinese Mars mission (named HX01), and one of the main science goals of which is to quantitatively analyze the element species and abundance of the regolith and identify the mineral type of target on the Martian surface. LIBS spectrum composes the information of the element species and their concentration, which plays a decisive role in mineral type identification. As known, the minerals in the Martian regolith are very complex. Therefore, a research strategy for better identification of minerals is needed, which also lays a solid foundation for element quantitative analysis.

For better and more accurate identification of rock/mineral type using LIBS data, some data pre-processing and multivariate statistical model building methods were used: normalizing the LIBS emission lines with a Carbon-LIBS signal of CO2 gas in the Martian atmosphere as an internal standard[9]; establishing an element analysis model using a full-spectrum multivariate statistical method[10]; dividing samples into multiple sub-models according to the element abundance and then establishing sub-models Partial Least Squares (PLS) quantitative analysis model one by one[11]. Nowadays, most of the methods are aimed at firstly obtaining the element concentration before rock/mineral identification. However, diversity of minerals/rocks distributes on the Martian surface. And different kinds of minerals/rocks have different element species, element abundance and matrix effects. Therefore, it's better to firstly identify the element abundance level (such as high Si, low Si or high Ca, low Ca) according to the element emission lines serving as a precondition both for model building and prediction of subsequent quantitative analysis, which aims to reduce the matrix effects to a certain degree. Herein, this paper directly focuses on LIBS data and adopts the loading space distance of Principal Component Analysis (PCA), as new criteria to select some feature spectral lines for element abundance recognition of minerals or rocks. PCA is often used in LIBS research field. The predecessors[12-13] generally focus on the spatial information of the PCA score and classify samples according to the point distribution in the score space. WANG Qian-qian, et al creatively used the product of the loading of each line and the variance of the corresponding Principal Component (PCs) to select feature lines and identify the element information of the selected feature lines[14]. In the process of using PCA to classify samples, we also noticed that the PCA loading can be used to effectively identify the variables that have the greatest influence on the PCs, which can be used to select the feature lines. We creatively trained and identified characteristic spectral lines for each element, which can improve the efficiency of data utilization and sample identification. In this paper, ChemCam pre-flight calibration data were used as a demonstration to show our loadings space distance method.

1 Data analysis 1.1 Data description

The data of 69 ChemCam preflight calibration standard samples[15] is used to test our method in this study. The main part of those standards is igneous material, and it also includes a small quantity of sediment material (such as sulfates, carbonates and phyllosilicates) and some pure minerals. These standard geological samples covered most species and abundance ranges of typical element compositions on the Martian surface. Most of these standards were obtained from Brammer Standard Company, Inc., and had undergone multiple analysis methods for element content confirmation[16]. Because some standard samples such as Macusanite, NaU2, SWY1, Ultramafic and WM had incomplete concentration data, only 64 samples with both confirmed concentration data and LIBS spectral data are selected for this analysis. To facilitate the data processing results, we number all the samples from R1 to R64.The detail information about the samples and data sets are shown in Table 1.

Table 1 The name and corresponding number of all standard samples

The element concentration distribution of the whole geological standard samples is also investigated, and the result is shown in Fig. 1 using box plot. As shown, the major elements, i.e. Si, Al, Fe, Mg, Ca, have a relatively higher concentration and wider distribution range. On the contrary, the trace elements such as Ti, Mn, Na, K, P, only have a relatively lower concentration and narrower distribution range. Among all the major elements, Si is the most important diagenetic element and has the peak concentration in the range from 15% to 75%. Al, Fe, Mg, Ca also are the main diagenetic cationic elements which have multiple strong emission lines and are critical for mineral type identification. Some trace elements can be used as the mark for mineral evolutions or degree of weathering/alteration. For example, volcanic rocks are characterized by especially high contents of K and Fe[17]. Fe coupled with Mg are the indicator of mudstone, and Ca and Mg are used to describe limestone[18]. Fe/Mn ratios in pyroxenes can be used to recognize Martian meteorites[19].

Fig.1 Box plot of element concentration distribution for all 64 standards (IQR means interquartile range)

The LIBS data used in this study was published by the ChemCam team and recorded by ChemCam copy system. Every LIBS data contains 6 144 channels[16] which were recorded by using three spectrometer covering 240~850 nm except for a gap from 340 nm to 385 nm (refer to UV, VIS and NIR). The paneled CCD detector of each spectrometer has 2048×512 pixels. The raw LIBS data of all the used spectrometers was pre-processed by the ChemCam team, which consists of removing the non-laser-induced background ('Dark Subtract'), de-noising the spectrum, removing the electron continuum and calibrating for wavelength and instrument response for a sample at a given distance. The pre-processed data, called 'level-1 data', is used in this study[16].

LIBS has a powerful element excitation ability. For example, the emission lines of element Mg are shown in the LIBS spectrum of Fig. 2. Ideally, without the matrix effect, the intensity of the LIBS emission line is positively related to the target concentration according to the excited element, which is the basis of quantitative analysis for element abundance and the identification of mineral rock type. In Fig. 2, the Mg characteristic peaks at 279.09 nm of R64 are much stronger than that of R51 which consist with the fact that sample R64 has higher concentration of Mg than R51. The peak intensities (dimensionless parameter) of Mg in R51 and R64 are 8.245×1011 and 3.802 5×1012, respectively. The peak intensity ratio of these two Mg samples is 6.1 which is similar but not direct proportion to the peak intensity ratio of 4.61 at 279.09 nm due to matrix effect. The matrix effect should be reduced to a certain degree for better qualitative and quantitative analysis.

Fig.2 LIBS spectral data of sample standards R51 and R64
1.2 Theory of PCA

PCA is amultivariate technique used to analyze the matrix composed of several inter-related observation variables[20]. It is a powerful method to extract key information, compress the amount of data by calculating new sets of orthogonal variables, i.e. the Principal Component (PC), and project the original observed variables into a new space.

In PCA, the PCs are obtained from the Singular Value Decomposition (SVD)[21-22] of the observation matrix X. The SVD is a generalization of the eigen-decomposition, which can be used to analyze rectangular matrices (the eigen-decomposition is defined only for squared matrixes)[23]. The main idea of the SVD is to decompose a rectangular matrix into three simple matrices: two orthogonal matrices and one diagonal matrix. For a rectangular observation matrix X (m×n), we did SVD decomposition on it[24]

$ \boldsymbol{X}=\boldsymbol{U} \cdot \boldsymbol{S} \cdot \boldsymbol{V}^{\mathrm{T}}+\boldsymbol{E}=\boldsymbol{P} \cdot \boldsymbol{T}^{\mathrm{T}}+\boldsymbol{E} $ (1)

where, U is a column orthogonal matrix, satisfied that UTU=Im, I is identity matrix; VT is a row orthogonal matrix, satisfied that VTV=In; S is a diagonal matrix of the singular values, and $ \mathit{\boldsymbol{S}} = \sqrt {\mathit{\boldsymbol{ \boldsymbol{\varLambda} }}} $, with Λ being the diagonal matrix of the eigenvalues of matrix XXT and of the matrix XTX; P is the score matrix; TT is the loading matrix and the matrix E contains the residuals. The detailed derivation process is well described in Ref. [23].

The scores and loadings matrix describe the relationship between PCs and observation variables. The score values for the first several PCs, i.e. PC1, PC2, PC3…, together span a low dimension mathematical space, usually referred as a score plot, through which the samples could be projected and viewed. In this study, the LIBS spectral data matrix of 64 ChemCam standard geological samples are projected into the low dimension space to form a n-dimensional model. Classification of the data, i.e. samples, is more easily distinguished, and outliers can be diagnosed[24]. The loading value reflects the correlation between a variable and the principal component at the corresponding position. A negative loading value only means that the variable (spectral line) has a negative correlation with the principal component at the corresponding position and has nothing to do with other principal components. Therefore, the plus or minus of the loading value does not represent the positive or negative contribution of the extraction information. In the specific data processing, loading value can be regarded as the weight, which is exactly how the spatial distance is calculated.

1.3 Data processing

In this study, the Euclidean distance between the distributed variables in the loadings space and the spatial origin point (i.e. the square root of the quadratic sum of feature emission lines′ loadings values corresponding to different PCs) is calculated, which served as a criterion to determine the influence of the variables (feature emission lines) on PCs. Using this method to treat LIBS spectral data, we can calculate the loading space distance of the LIBS to determine the variable (LIBS emission line in this case) which has the greatest influence on the PCs. This is to find the critical LIBS emission line which makes the greatest contribution to element identification. Next, sample grouping (splitting the samples according to the value of variable into a number of equally spaced ranges defined by the number of groups box, Unscramble® X, CAMO Software Inc) based on PCA score results will be performed to distinguish the element abundance level of the samples (i.e. the samples types) using the only one selected critical emission line as the independent variable, which will be helpful to find the samples with similar matrix to reduce the matrix effect and improve the accuracy of subsequent element quantitative analysis if necessary.

As known, the rocks/minerals have several major elements which should be treated as the flow chart of Fig. 3 for getting more accurate identification results. Here, as a demonstration, Si element is selected as an example to describe how the flow chart works due to its high content in crust and important diagenetic ability in geology. The detailed data analysis processing is as follows:

Fig.3 The flow chart of data processing of a target element

Step1: 1) Si is selected as the target element and its concentration data of different samples is sorted in descending order; 2) The samples above a certainconcentration threshold value according to the given classification boundary (mean concentration of chemical composition of Martian soil) are selected to form the verification sample set.

Step2: 1) 1-3 LIBS spectral data of standard sample with high concentration of Si (above the Si concentration threshold of the Step1) are selected to form the LIBS training set; 2) the assignment of Si emission lines of the selected LIBS training set are identified by using the Ocean Optics software MaxLIBS (version 1.6.7), and 13 stronger emission lines are selected from 6 144 LIBS channels. In fact, this means that the LIBS data is compressed from 6 144 channels to ~30(< 30) channels; 3) PCA analysis is performed on the above newly selected LIBS Mselected-matrix (64×n, n < 30), and the Euclidean distance between its loading value point and the spatial origin point (i.e. loading distant as defined above) is calculated and sorted in ascending order for more easily finding the only one characteristic spectral line which has the greatest influence on the PCs; 4) the only one selected spectral with the greatest influence on the main PCs is used as the independent variable for sample grouping operated by using the Unscramble® X software (version 10.4) based on the PCA score results of LIBS Mselected-matrix. Samples identified in this operation formed the identification sample set.

Step3: By comparing the samples in the identification sample set with the samples in the verification sample set, the Matching Degree (MD, refers to the ratio of the number of matched samples to the number of identified samples) is calculated to evaluate the recognition ability of the analyzed elements. If MD>0.5, output the results, otherwise, adjust the parameters (i.e. the group number in sample grouping) and redo the sample grouping and comparison to test its final MD value.

1.4 Element concentration threshold for verification sample set

To form a verification sample set, the first thing is to set a selection threshold (i.e. classification boundary) for standard samples selection. However, it is not a good way to select the half of the maximum concentration value as the selection threshold without considering the mineral distribution feature of the Martian surface. Because the first pre-flight calibration of ChemCam is designed to analyze the element concentration on the Martian surface, it is reasonable to set the element average value of the Martian surface soil as the classification threshold. As known, Martian soil is ubiquitous on the Martian surface, and the element abundance of the Martian soil has a direct influence on Mars Lander or patrol detection results[25]. At present, several Mars landers and rovers have detected and analyzed the basic properties of Martian soil, including the element composition of rocks, minerals at different places on Mars such as north hemisphere (Viking, Pathfinder), southern hemisphere (Spirit, Opportunity and Curiosity) and the northern polar regions (Phoenix). Fig. 4 [26] showed the average element composition of the Martian soil at Curiosity, Spirt, and Opportunity landing sites. It is obvious that the abundance of the same elements at different landing sites only has slightly difference. Although the different landing sites have various properties, the soil element contents of these landing sites are exactly similar.

Fig.4 Comparison of the average composition of Martian soil at various landing sites

The relative homogeneous abundance distribution of the Martian surface element might be due to the sufficient mixing effects of global/regional dust storms or the relative homogeneous chemical composition of the red planet's crust [25]. The average value of the basaltic soil compositions from APXS analysis of Rocknest Portage, Gusev Crater and Meridiani Planum[27] and the soil elements content from X-ray Fluorescence Spectrometer (XRSF) analysis of Utopian plain and Kirs plain [28] can be used as mean element composition of the Martian surface. In this study, the calculated average values of the elements Si, Ti, Al, Fe, Mn, Mg, Ca, Na, K, P are shown in Table 2, which are selected as the boundary threshold of element concentration. After respectively sorting the concentration data of 64 ChemCam preflight calibration standard samples in descending order, respectively, the samples with the element concentration greater than the boundary threshold are selected as verification sample set. For example, samples with Si element content greater than 45.41% will be selected as the verification set.

Table 2 Average chemical composition of Martian soil (wt%)
1.5 Selection of element feature emission lines

The relatively stronger feature emission lines (±3 pixels) of the target element are assigned and selected using MaxLIBS software (Ocean Optics, version 1.67), which is based on the National Institute of Standards and Technology (NIST) Atomic Spectra Database[29]. To be specific, all the LIBS spectral channels of the selected samples with a higher concentration of the target element in verification set are selected for this peak assignment of the target element emission lines. Only several relatively stronger and right-assigned LIBS feature emission lines (Table 3) are finally selected for the next PCA loading and loading space distant calculation. In case of making a wrong peak assignment, the selection of LIBS feature emission lines is performed, and the results are compared for several times using other 1 or 2 samples with a higher concentration of same the target element. As seen in Table 3, less than 30 LIBS channels are selected, which would be used as the optimized spectral matrix (i.e. Mselected-Matrix) for PCA loading distance calculation and sample grouping (by using software Unscramble® X).

Table 3 The feature emission lines of the target element selected for sample identification
2 Calculation and results 2.1 PCA loading distance calculation

After selecting emission lines, a spectral matrix with less than 30 channels is selected from 6 144 channels of sample LIBS data matrix. Then, the above newly selected spectral matrix, Mselected-Matrix, (m×64, m < 30) is performed PCA for loading space distance calculation, and only one emission line which has the maximum distance (i.e. the emission line has the greatest influence on sample identification) will be chosen as an independent variable for the next sample grouping (in software Unscramble® X). The first three PCs which are found to explain more than 95% of the information of the whole testing set, are selected to span a new three-dimensional space of scores and loading space for better displaying the relationship of every sample in an intuitive way. As shown in Table 4, the loading distance results of main element Al, K, Si are shown as an example for selecting an emission line with the greatest influence on our sample identification strategy. The results of other elements are listed in the supporting information. In Table 4, the first column is the wavelength of the selected feature emission lines in the LIBS training set. The distance of loading space for each element is sorted in ascending order for more convenient comparison. The emission line with the maximum distance value has the greatest influence on the sample grouping results, and then is selected for the target element as the only independent variable for sample identification using the sample grouping function of Unscramble® X software. From this view, sample grouping will be performed based on this selected emission line to identify the samples types in an easy way.

Table 4 PCA square loading distance of elements Al, K, Si
2.2 Sample grouping results

As mentioned above, the major element, Al, K and Si are analyzed, and the PCA score results of Mselected-Matrix are shown in Fig. 5 as a demonstration for our method, respectively. Based on the only one emission line selected by our loading space distance method, the sample grouping function is performed and the PCA score result of Mselected-Matrix is analyzed. The number of samples in the verification set and the number of identified samples by our method are shown in Table 5, the specific names of the whole verification set can be found in the supporting information.

Fig.5 Sample grouping result of element
Table 5 Comparison and matching rate of all elements between verification set and sample grouping identification results

In Fig. 5(a), the marks of open triangle samples represent the samples with Al element content higher than the boundary threshold value. As shown, those samples can be identified directly using only one LIBS emission line selected by ourloading space distance method, which is of great benefit to the accurate quantitative element analysis by reducing the matrix effect due to the selection of the samples with the same or similar element abundance. In this case, 42 samples are found meeting element Al boundary value, and they have been successfully identified as high-Al samples. Results show that 39 out of 42 samples are consistent with the verification set. Therefore, the matching degree of high-Al samples reaches 92.8%. Similarly, in Fig. 5(b), (c), the samples marked by the open triangle represent the samples with higher K, Si element abundance beyond the boundary threshold values. 40 high-K samples and 38 high-Si samples are found to meet the boundary value conditions. The matching degrees are 90.0% and 78.95%, respectively. In Table 5, all the elements are sorted in descending order according to the matching degree. As shown, the matching degree of two major elements (i.e. Al and K) is above 90%. And the matching degrees of other elements such as Ca, Si, Ti, Na, Mg and Fe are 0.8, 0.7, 0.5 and 0.7 respectively.

Based on our identification method, half of the minerals with the major elements on the Martian surface can be identified correctly with a matching degree higher than 75%. And 80% of the elements of minerals have a matching degree higher than 50%. This is to say, the elements with higher concentration in the Martian soil are much easier to be identified with a higher identification matching degree using our method. Our results are consistent with the results which are obtained by the quantitative analysis method of ChemCam using ICA and PLS[30]. The accuracy of ChemCam prediction also varies as a function of element abundance[30]. The abundance of Mg and Fe in Martian soil is relatively high, but our matching degree is only about 0.5. The low matching degree of Mg may be caused by the spectral lines interferences of other elements. The reason for the low matching degree of Fe may be caused by the matrix effect as well as spectral interferences of other elements, because Fe has more than 1000 spectral lines and can be easily interfered by others. The matching degree of Mn and P is near zero, probably because those trace elements have very low abundance both in standards and Martian soil and they are hard to be excited by laser.

3 Discussion

The loading space distance method is more helpful to assist the identification and the classification of substances to improve the utilization efficiency of the data beyond the mineral LIBS data and even other kinds of signals. Polyethylene Terephthalate (PET), high-density Polyethylene (PE), Polypropylene (PP) and polystyrene in plastics can be analyzed by LIBS[31], and the critical LIBS emission line of those organic molecules for effective identification of the plastics types might be selected using our method. A quantitative analysis model is built by ChemCam team by using Sub-model PLS method, and Anderson et al used Sub-model PLS method to achieve a higher quantitative analysis accuracy[11]. Our method is helpful to select the standard samples with less matrix effects for model building and improve the efficiency and classification accuracy.

4 Conclusion

In this paper, the loading space distance method is helpful to find the samples with very similar element specie and abundance, which is critical for mineral type identification and element quantitative analysis model building by reducing matrix effect to a certain degree. This method is easy because only one LIBS emission line which has the maximal loading space distance need to be selected for sample identification. Our study reduces the difficulty in rock/mineral classification and is beneficial to unknown minerals analysis. By analyzing the ChemCam published 64 standard geological samples using our method, the accuracy of the mineral identification is up to 92.8%. Our study offers an effective identification strategy for the Martian surface rock type analysis.

Acknowledgement Sincerely thanks to the ChemCam team for providing 64 standard samples data.

References
[1]
WIENS R C, MAURICE S, BARRACLOUGH B, et al. The ChemCam instrument suite on the Mars Science Laboratory (MSL) Rover: body unit and combined system tests[J]. Space Science Reviews, 2012, 170: 167-227. DOI:10.1007/s11214-012-9902-4
[2]
SAUTTER V, TOPLIS M J, WIENS R C, et al. In situ evidence for continental crust on early Mars[J]. Nature Geoscience, 2015, 8: 605-609. DOI:10.1038/ngeo2474
[3]
MAURICE S, CLEGG S M, WIENS R C, et al. ChemCam activities and discoveries during the nominal mission of the Mars Science Laboratory in Gale crater, Mars[J]. Journal of Analytical Atomic Spectrometry, 2016, 31: 863-889. DOI:10.1039/C5JA00417A
[4]
NACHON M, MANGOLD N, FORNI O, et al. Chemistry of diagenetic features analyzed by ChemCam at Pahrump Hills, Gale crater, Mars[J]. Icarus, 2017, 281: 121-136. DOI:10.1016/j.icarus.2016.08.026
[5]
WILLIAMS R M, GROTZINGER J P, DIETRICH W E, et al. Martian fluvial conglomerates at Gale crater[J]. Science, 2013, 340: 1068-1072. DOI:10.1126/science.1237317
[6]
BLANEY G B D, BRIDGES J, COUSIN A, et al. Possible alteration of rocks observed by Chemcam along the traverse to Glenelg in Gale crater on Mars[C]. EGU2013, 2013: 1502.
[7]
CLEGG S M, MANGOLD N, LE MOUÉLIC S, et al. High calcium phase observations at rocknest with ChemCam[C]. The Woodlands, Texas: 44th Lunar and Planetary Science Conference, 2013: 2087.
[8]
YE Pei-jian, SUN Ze-zhou, RAO Wei, et al. Mission overview and key technologies of the first Mars probe of China[J]. Science China Technological Sciences, 2017, 60: 649-657. DOI:10.1007/s11431-016-9035-5
[9]
LEFEBVRE C, CATALA-ESPI A, SOBRON P, et al. Depth-resolved chemical mapping of rock coatings using Laser-Induced Breakdown Spectroscopy: Implications for geochemical investigations on Mars[J]. Planetary and Space Science, 2016, 126: 24-33. DOI:10.1016/j.pss.2016.04.003
[10]
MAURICE S, WIENS R, MSL SCIENCE TEAM. Overview of 100 sols of Chemcam operations at gale crater[C]. EGU2013, 2013: 14161.
[11]
ANDERSON R B, CLEGG S M, FRYDENVANG J, et al. Improved accuracy in quantitative laser-induced breakdown spectroscopy using sub-models[J]. Spectrochimica Acta Part B: Atomic Spectroscopy, 2017, 129: 49-57. DOI:10.1016/j.sab.2016.12.002
[12]
RODARMEL C, SHAN J. Principal component analysis for hyperspectral image classification[J]. Surveying and Land Information Science, 2002, 62(2): 115-122.
[13]
WANG Qian-qian, HUANG Zhi-wen. Classification and identification of plastics laser-induced breakdown spectrum based on principal component analysis and artificial neural network[J]. Spectroscopy and Spectral Analysis, 2012, 32: 3179-3182.
[14]
WANG Qian-qian, TENG G, QIAO Xiao-lei, et al. Importance evaluation of spectral lines in Laser-induced breakdown spectroscopy for classification of pathogenic bacteria[J]. Biomedical Optics Express, 2018, 9: 5837-5850. DOI:10.1364/BOE.9.005837
[15]
CLEGG S M, SKLUTE E, DYAR M D, et al. Multivariate analysis of remote laser-induced breakdown spectroscopy spectra using partial least squares, principal component analysis, and related techniques[J]. Spectrochimica Acta Part B: Atomic Spectroscopy, 2009, 64(1): 79-88. DOI:10.1016/j.sab.2008.10.045
[16]
WIENS R C, MAURICE S, LASUE J, et al. Pre-flight calibration and initial data processing for the ChemCam laser-induced breakdown spectroscopy instrument on the Mars Science Laboratory rover[J]. Spectrochimica Acta Part B: Atomic Spectroscopy, 2013, 82: 1-27. DOI:10.1016/j.sab.2013.02.003
[17]
MCSWEEN H Y. Petrology on Mars[J]. American Mineralogist, 2015, 100(11-12): 2380-2395. DOI:10.2138/am-2015-5257
[18]
SHI Qi, NIU Guang-hui, LIN Qing-yu, et al. Quantitative analysis of sedimentary rocks using laser-induced breakdown spectroscopy: comparison of support vector regression and partial least squares regression chemometric methods[J]. Journal of Analytical Atomic Spectrometry, 2015, 30: 2384-2393. DOI:10.1039/C5JA00255A
[19]
MCSWEEN H Y. SNC meteorites: are they Martian rocks?[J]. Journal of Geology, 1984, 12(1): 3-6.
[20]
ABDI H, WILLIAMS L J. Principal component analysis[J]. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(4): 433-459. DOI:10.1002/wics.101
[21]
GOLUB GENE H. (GENE HOWARD) matrix computations[B]. Johns Hopkins University Press, 1996.
[22]
VETTERLING W T, PRESS W H, TEUKOLSKY S A, et al. Numerical recipes example book (C++): the art of scientific computing[M]. Cambridge University Press, 2002.
[23]
ABDI H. Singular Value Decomposition (SVD) and Generalized Singular Value Decomposition (GSVD)[J]. Encyclopedia of Measurement and Statistics, 2007, 907-912.
[24]
LI Bo-yan, HU Yun, Liang yi-zeng, et al. Quality evaluation of fingerprints of herbal medicine with chromatographic data[J]. Analytica Chimica Acta, 2004, 514: 69-77. DOI:10.1016/j.aca.2004.03.041
[25]
OUYANG Zi-yuan, ZOU Yong-liao. Introduction to Mars science[M]. ShangHai: Shanghai Science and Technology Education Press, 2017.
[26]
YEN A S, GELLERT R, CLARK B C, et al. Evidence for a global martian soil composition extends to Gale Crater[C]. 44th Lunar and Planetary Science Conference, 2013: 2495.
[27]
BLAKE D F, MORRIS R V, KOCUREK G, et al. Curiosity at gale crater, Mars: characterization and analysis of the rocknest sand shadow[J]. Science, 2013, 341(6153): 1239505. DOI:10.1126/science.1239505
[28]
TAYLOR S R, MCLENNAN S. Planetary crusts: their composition, origin and evolution[M]. Cambridge University Press, 2009.
[29]
OLSEN A K, RALCHENKO Y. NIST LIBS Database: NIST 2017[DB/OL][2019-05-09]. https://physics.nist.gov/PhysRefData/ASD/LIBS/libs-form.html.
[30]
COUSIN A, SAUTTER V, PAYRÉ V, et al. Classification of igneous rocks analyzed by ChemCam at Gale crater, Mars[J]. Icarus, 2017, 288: 265-283. DOI:10.1016/j.icarus.2017.01.014
[31]
UNNIKRISHNAN V K, CHOUDHARI K S, KULKARNI S D, et al. Analytical predictive capabilities of laser induced breakdown spectroscopy (LIBS) with principal component analysis (PCA) for plastic classification[J]. RSC Advances, 2013, 3(48): 25872-25880. DOI:10.1039/c3ra44946g