Abstract:
The rational selection of non-geological hazard samples is of great significance to improve the accuracy of geological hazard susceptibility prediction. This study uses Liulin County as a case study, where appropriate impact factors were selected, and the random forest (RF) model was employed for susceptibility assessment based on GIS technology. A total of twenty sets of models were created by varying the ratio of geological hazard to non-geological hazard points (1∶1, 1∶1.5, 1∶3, 1∶5 and 1∶10) and the distance from non-geological hazard points to known hazard points (100,500,800,1000 m). The results demonstrate that: (1) Through error index, confusion matrix, and ROC curve tests, the sample proportion and distance from the known hazard point significantly influenced the geological hazard susceptibility evaluation. As the sample proportion decreased and the distance from known hazard points increased, the overall
MAE and
RMSE of the models decreased, while the overall ACC increased. All models achieved
AUC value greater than 0.8, indicating excellent predictive performance. When the sample proportion was less than 1∶3, the increasing distance from the known hazard points on model error and accuracy became less pronounced, stabilizing the results. The most suitable model for the study area was found to have a sample ratio of 1∶10 and a distance of 1000 m from known hazard points. (2) High and very high susceptibility areas were primarily located in the central and northern regions, adjacent to roads and rivers, making them key areas for hazard prevention and reduction in Liulin County. (3) Differences in sample selection led to varying susceptibility results mainly due to changes in the RF model's data feature collection and judgment during the modeling process, as well as the representativeness of the samples. These research findings hold significant implications for the implementation of hazard prevention and reduction measures.