Advancing Breast Cancer Subtype Prediction and Mutation Analysis: Integrating Deep Learning and Machine Learning Techniques in Genomic Research
DOI:
https://doi.org/10.58190/icisna.2024.83Keywords:
Machine Learning, Deep Learning, Breast Cancer, Silhouette metric, ClusteringAbstract
Breast cancer, a heterogeneous disease, can be classified into several subtypes, each associated with distinct genetic mutations and clinical outcomes. As per the research article by National Institutes of Health it was stated that 25% of hereditary cases are due to the mutation of highly penetrant genes which leads to 80% lifetime risk of breast cancer [15]. This study aims to apply advanced deep learning and machine learning algorithms to predict breast cancer subtypes and to identify key genetic mutations contributing to the disease using a comprehensive gene expression dataset. We analyzed a dataset comprising 1904 samples, encompassing 331 genes and 175 gene mutations, sourced from a public platform and including PAM50 and Claudin low categorizations. Due to limited observations, SMOTE was employed for data augmentation, and Principal Component Analysis (PCA) was used to assess data variance. Several machine learning models, including Random Forest Classifier, Support Vector Machine, K-Nearest Neighbor, XGBoost, and Stacked models, were applied alongside deep learning techniques like Convolutional Neural Network and Multi-Layer Perceptron. The Stacked model demonstrated superior performance with an accuracy of 0.955, outperforming other models. The deep learning models achieved accuracies of 0.911 (CNN) and 0.936 (MLP). KNN analysis revealed potential clusters based on gene and mutation data, with the silhouette metric identifying "siah1_mut," "nras_mut," and "hras_mut" as significant mutations. The optimal clustering achieved a silhouette score of 0.997 for two clusters. These mutations may play pivotal roles in breast cancer pathogenesis and could serve as targets for therapeutic interventions. Our findings demonstrate the effectiveness of integrating stacked algorithms and deep learning models in predicting breast cancer subtypes. The identification of key mutations through clustering techniques provides valuable insights into the genetic underpinnings of breast cancer, which could guide future research and the development of targeted therapies. This study highlights the potential of advanced computational approaches in elucidating the complex landscape of breast cancer genomics and paves the way for personalized medicine in oncology.