https://doi.org/10.1140/epjc/s10052-025-15189-4
Regular Article - Computing, Software and Data Science
On the integration of quantum machine learning into hybrid frameworks for high energy particle physics
1
Department of Physics, Firat University, Center, 23119, Elazig, Türkiye
2
Department of Physics, Yildiz Technical University, Davutpasa Street, 34220, Istanbul, Türkiye
a
This email address is being protected from spambots. You need JavaScript enabled to view it.
Received:
27
June
2025
Accepted:
9
December
2025
Published online:
22
December
2025
Abstract
In this study, the potential of quantum machine learning (QML) techniques based on trainable quantum circuits was explored for vector boson identification at the large hadron collider (LHC). Specifically, the compact muon solenoid (CMS) experiment dataset was employed to reconstruct the Z boson through the muon–antimuon (
) decay channel using variational quantum circuits (VQC). To examine the effect of data structure on QML performance, various preprocessing strategies were applied, including different train/test splits, feature selection, dimensionality reduction, and class balancing techniques. The dataset was evaluated under two train/test configurations, namely a balanced split (70:30) and an imbalanced split (80:20), in order to examine the effect of class distribution on QML outcomes. Feature selection based on Random Forest (RF) was used to extract the most informative variables, while principal component analysis (PCA) was utilized to reduce input dimensionality and optimize qubit usage. To mitigate class imbalance, resampling techniques such as the Synthetic Minority Over-sampling Technique (SMOTE), SMOTE combined with edited nearest neighbors (SMOTEENN), and SMOTE with Tomek Links (SMOTETomek) were implemented. A comparative evaluation using stratified cross-validation was conducted to assess model performance and generalization ability across different configurations. The findings indicated that integrating PCA with resampling methods substantially improves the generalization capacity of the VQC model, especially in imbalanced settings. Among all configurations, SMOTE and SMOTEENN delivered the highest classification performance, boosting sensitivity to the minority class and enhancing model stability. These results highlight the significance of data structure, feature reduction, and resampling in classical quantum (CQ) data processing for high energy physics applications.
© The Author(s) 2025
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Funded by SCOAP3.

