Healthcare machine learning

Breast Cancer Malignancy Diagnostics using Neural Networks

A structured-data classification project using cellular nuclei measurements to compare traditional classifiers and neural-network approaches for breast cancer malignancy prediction.

Neural networks scikit-learn Feature engineering ROC analysis Medical data

Project Goals

  • Compare classical classification methods with neural-network architectures.
  • Use feature selection and transformation to improve model inputs.
  • Evaluate which cellular measurements were most useful for diagnosis-oriented classification.
  • Frame the work as decision-support modeling, not clinical deployment.

Data and Modeling

The project uses the Breast Cancer Wisconsin Diagnostic dataset from Kaggle, containing 569 samples with cellular nuclei measurements such as radius, texture, perimeter, area, and concavity.

Low-correlation features were removed, transformed variants were explored, and models including KNN, SVM, Random Forest, Decision Trees, and neural networks were compared.

Key Result

The notebook reports a strongest held-out accuracy of 0.982, with ROC AUC output around 0.967 to 0.977 across saved runs. Classical methods such as Random Forest and KNN remained competitive, which made model comparison more useful than a single-model claim.

Concave points, area-related variables, and texture emerged as important signals for distinguishing malignant and benign samples.

Evaluation Framing

  • Accuracy and ROC/AUC were used to compare model behavior on the project split.
  • Correlation and feature-distribution visuals helped explain why certain predictors mattered.
  • The result is best read as a supervised-learning case study on a public diagnostic dataset.

Limitations and Next Steps

  • The dataset is compact, clean, and public, so it does not represent deployment conditions in a clinical workflow.
  • A stronger next pass would add repeated cross-validation, calibration review, and clearer sensitivity/specificity tradeoff analysis.
  • Any real diagnostic use would require clinical validation, expert review, and governance beyond this portfolio project.

Visual Evidence

Classifier performance comparison
Classifier performance comparison across model types.
ROC curve for the neural network model
ROC curve for the neural-network model.
Final correlation heatmap after feature dropping
Final correlation heatmap after feature dropping.
Clustered heatmap after feature dropping
Clustered heatmap of retained features.
Feature distributions comparing benign and malignant diagnosis groups
Generated feature distributions from the project dataset.