Telecom-Prediction

Predicting telecom customer churn on the 51k-row Cell2Cell dataset using a classic logistic baseline and tuned neural networks (dense and wide-&-deep) with focal loss and decision-threshold optimization. Notebook-first workflow; all artifacts saved under data/.

專案重點：資料處理流程透明、可重現的模型訓練與最佳化、並附上輸出結果。若要看更完整的 ML 作品集，可直接跳到「More ML projects」段落。

Repository layout

00-eda.ipynb — quick EDA of raw data.
01-preprocess-dataset.ipynb — feature engineering, encoding, scaling, splits; saves processed data & pipelines.
02-baseline-model.ipynb — class-balanced logistic regression baseline.
03-hyperparameter-tuning.ipynb — KerasTuner Bayesian search for dense + wide&deep architectures (with focal loss).
04-train-deep-learning-models.ipynb — retrain best models to best epochs.
05-analysis.ipynb — test-set evaluation, threshold optimization, plots & metrics export.
utils.py — helpers for loading/saving artifacts.
data/ — raw dataset (cell2celltrain.csv), processed splits, pipelines, tuned hyperparameters, trained models (*.keras), and plots/CSVs with metrics.
Telecom_Prediction_Report1.pdf, Telecom_Prediction_Report2.pdf — slide-style reports of the findings.

Data

Source: Cell2Cell churn dataset (data/cell2celltrain.csv, 51,047 rows, 58 columns) — public version as seen on Kaggle/IBM; problem type is binary classification on Churn (Yes/No → 1/0).
Feature engineering: derives InactiveSubs = UniqueSubs - ActiveSubs, HandsetDiff = Handsets - HandsetModels, cleans negative numeric values to NaN, parses handset price to numeric + HandsetPrice_Unknown flag, removes constants/highly redundant columns (CustomerID, CallForwardingCalls, HandsetModels, etc.). Data dictionary and column notes are documented in the reports/notebooks (per ACTL5111/3143 spec).
Encoding: ordinal for CreditRating/IncomeGroup, one-hot for nominal features, target-mean encoding for high-cardinality ServiceArea, median imputation + scaling for numerics.
Splits: stratified into train/val/test (60/20/20) saved as X_*_base.csv (for linear models) and X_*_deep.csv (for NNs) with labels y_*.csv.

Data dictionary

Draft table lives in data_dictionary_draft.md (column, description, datatype; target marked).
Reports still contain column notes; this README links to the table so markers/interviewers can find it quickly.

Modeling workflow

00-eda.ipynb: sanity checks on class balance and feature distributions.
01-preprocess-dataset.ipynb: build and persist preprocessing pipelines (baseline.pkl, deep.pkl) and processed splits.
02-baseline-model.ipynb: logistic regression with class_weight='balanced'; saves logistic_baseline_model.pkl.
03-hyperparameter-tuning.ipynb: Bayesian search (KerasTuner) over depth/width/regularization/optimizer; uses custom FocalLoss(alpha=0.75) to handle imbalance. Best hyperparameters saved to data/best_hyperparameters_dense.json and data/best_hyperparameters_widedeep.json.
04-train-deep-learning-models.ipynb: retrains dense NN and wide-&-deep NN to best epochs (data/best_epoch_dense.json ~39, best_epoch_widedeep.json ~42); exports models as final_model_dense.keras, final_model_widedeep.keras, etc., plus training curves.
05-analysis.ipynb: compares default 0.5 threshold vs. optimized thresholds (maximize F1), exports metrics/plots (data/final_model_metrics*.csv, roc_curve_comparison.png, cm_*.png, threshold_optimization_comparison.png).

Quickstart

Prereqs: Python 3.10+, pip, virtualenv recommended.

python3 -m venv .venv && source .venv/bin/activate
pip install -U pandas numpy scikit-learn seaborn matplotlib joblib tensorflow keras-tuner nbformat

Run notebooks in order (00 → 05) to fully reproduce preprocessing, tuning, training, and evaluation (mirrors the ACTL5111/3143 “Run All” checklist). If you only need inference with the best models:

Activate the env above. 2) Load deep.pkl to preprocess new data; 3) Load final_model_widedeep.keras or final_model_dense.keras and apply the tuned decision threshold from the Results section.

Note: If sharing code without the dataset, place cell2celltrain.csv in data/ before running notebooks. Reports should include the required “Generative AI usage” appendix per the course spec.

Results (test set)

Default threshold (0.5): logistic baseline F1 0.44 / ROC-AUC 0.61; dense NN F1 0.14 / ROC-AUC 0.64; wide&deep F1 0.15 / ROC-AUC 0.64 (data/final_model_metrics.csv).
Optimized thresholds (maximize F1):
- Baseline @ 0.43 → F1 0.44, Recall 0.54, ROC-AUC 0.61.
- Dense NN @ 0.20 → F1 0.48, Recall 0.84, ROC-AUC 0.64.
- Wide&Deep @ 0.19 → F1 0.48, Recall 0.85, ROC-AUC 0.64 (data/final_model_metrics_with_opt_threshold.csv).
Visuals: ROC/PR comparisons (roc_curve_comparison.png, pr_curve_comparison.png), confusion matrices (cm_*.png, optimized versions), training curves (training_curve_dense*.png, training_curve_widedeep*.png), threshold effect chart (threshold_optimization_comparison.png).

More ML projects

If you’re reviewing my broader ML work with fuller READMEs and production-minded pipelines, please check my GitHub profile: https://github.com/audrey9212 (see pinned repositories for the most relevant case studies).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Telecom-Prediction

Repository layout

Data

Data dictionary

Modeling workflow

Quickstart

Results (test set)

More ML projects

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
tuner_dir		tuner_dir
.gitignore		.gitignore
00-eda.ipynb		00-eda.ipynb
01-preprocess-dataset.ipynb		01-preprocess-dataset.ipynb
02-baseline-model.ipynb		02-baseline-model.ipynb
03-hyperparameter-tuning.ipynb		03-hyperparameter-tuning.ipynb
04-train-deep-learning-models.ipynb		04-train-deep-learning-models.ipynb
05-analysis.ipynb		05-analysis.ipynb
ACTL3143 Project Specification v1.3 copy.pdf		ACTL3143 Project Specification v1.3 copy.pdf
CPS490 Seminar Paper Draft (18).docx		CPS490 Seminar Paper Draft (18).docx
README.md		README.md
Telecom_Prediction_Report1.pdf		Telecom_Prediction_Report1.pdf
Telecom_Prediction_Report2.pdf		Telecom_Prediction_Report2.pdf
data_dictionary_draft.md		data_dictionary_draft.md
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

Telecom-Prediction

Repository layout

Data

Data dictionary

Modeling workflow

Quickstart

Results (test set)

More ML projects

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages