Machine Learning Techniques for Soil Characterization Using Cone Penetration Test Data

oleh: Ayele Tesema Chala, Richard P. Ray

Format: Article
Diterbitkan: MDPI AG 2023-07-01

Deskripsi

Seismic response assessment requires reliable information about subsurface conditions, including soil shear wave velocity <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mo stretchy="false">(</mo><msub><mi>V</mi><mi>s</mi></msub><mo stretchy="false">)</mo></mrow></semantics></math></inline-formula>. To properly assess seismic response, engineers need accurate information about <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>V</mi><mi>s</mi></msub></mrow></semantics></math></inline-formula>, an essential parameter for evaluating the propagation of seismic waves. However, measuring <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>V</mi><mi>s</mi></msub></mrow></semantics></math></inline-formula> is generally challenging due to the complex and time-consuming nature of field and laboratory tests. This study aims to predict <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>V</mi><mi>s</mi></msub></mrow></semantics></math></inline-formula> using machine learning (ML) algorithms from cone penetration test (CPT) data. The study utilized four ML algorithms, namely Random Forests (RFs), Support Vector Machine (SVM), Decision Trees (DT), and eXtreme Gradient Boosting (XGBoost), to predict <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>V</mi><mi>s</mi></msub></mrow></semantics></math></inline-formula>. These ML models were trained on 70% of the datasets, while their efficiency and generalization ability were assessed on the remaining 30%. The hyperparameters for each ML model were fine-tuned through Bayesian optimization with k-fold cross-validation techniques. The performance of each ML model was evaluated using eight different metrics, including root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), coefficient of determination (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msup><mi>R</mi><mn>2</mn></msup></mrow></semantics></math></inline-formula>), performance index (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>P</mi><mi>I</mi></mrow></semantics></math></inline-formula>), scatter index (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>S</mi><mi>I</mi></mrow></semantics></math></inline-formula>), <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>A</mi><mn>10</mn><mo>−</mo><mi>I</mi></mrow></semantics></math></inline-formula>, and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>U</mi><mrow><mn>95</mn></mrow></msub></mrow></semantics></math></inline-formula>. The results demonstrated that the RF model consistently performed well across all metrics. It achieved high accuracy and the lowest level of errors, indicating superior accuracy and precision in predicting <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>V</mi><mi>s</mi></msub></mrow></semantics></math></inline-formula>. The SVM and XGBoost models also exhibited strong performance, with slightly higher error metrics compared with the RF model. However, the DT model performed poorly, with higher error rates and uncertainty in predicting <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>V</mi><mi>s</mi></msub></mrow></semantics></math></inline-formula>. Based on these results, we can conclude that the RF model is highly effective at accurately predicting <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><msub><mi>V</mi><mi>s</mi></msub></mrow></semantics></math></inline-formula> using CPT data with minimal input features.