MLOps Lending Club Loan Prediction Project

This is the final project for the MLOps ZoomCamp course by DataTalks.Club.

Problem Statement

We need to build an end-to-end machine learning system that predicts whether a particular user returns the loan or not. We access the probability of full repayment based on various features - employment length, annual income, home ownership status, etc.

It should be a fault-tolerant, monitored, and containerized web service that receives the object's features and returns whether the loan will be repaid.

System Description

The system contains the following parts:

Experiment tracking and model registry (we use MLFlow)
An orchestrated model training pipeline (Prefect)
Web service deployment (Flask)
Model monitoring
Test coverage with unit and integration tests
Code linters and formatters
Pre-commit hooks
Makefile
CI pipeline (GitHub workflow)

The project runs locally and uses AWS S3 to store model artifacts using MLFlow. It is containerized and can be easily deployed to the cloud.

Dataset

We took the Lending Club dataset and reduced it to 10k records to speed up training and prototyping. It contains 23 features and the target column is_bad - whether the loan was repaid or not. For this project, we use 3 features - employment length, annual income, and home ownership status.

How to Run

Serving Part

Clone the repo

git clone https://github.com/KarimLulu/mlops-loan-prediction.git

Navigate to the project folder

cd mlops-loan-prediction

Build all required services

docker-compose build

Create and start containers

docker-compose up

Send some data records to the system in a separate terminal window:

make setup
pipenv run python -m monitoring.send_data

Open Grafana in the browser and find Evidently Data Drift Dashboard
Enjoy the live data drift detection!

Experimentation and orchestration part

Set up the environment and prepare the project

make setup

Start Prefect server

pipenv run prefect orion start --host 0.0.0.0

Install aws-cli and configure AWS profile

If you've already created an AWS account, head to the IAM section, generate your secret-key, and download it locally. Instructions

Configure aws-cli with your downloaded AWS secret keys:

   $ aws configure
   AWS Access Key ID [None]: xxx
   AWS Secret Access Key [None]: xxx
   Default region name [None]: eu-west-1
   Default output format [None]:

Verify aws config:
```
  $ aws sts get-caller-identity
```

Set S3 bucket

export BUCKET_NAME=s3-bucket-name

Run MLFlow server

pipenv run mlflow server --default-artifact-root s3://$BUCKET_NAME --backend-store-uri sqlite:///mlflow_db.sqlite

Create the deployment for the training pipeline

pipenv run python -m prediction_service.train_workflow

Run the deployment

pipenv run prefect deployment run 'main-flow/model_training_workflow'

Run the agent

pipenv run prefect agent start -q 'mlops'

Wait until it finishes and registers the new production model.

Run tests and code quality checks

Unit tests

make test

Integration tests

make integration_test

Code formatting and code quality checks (isort, black, pylint)

make quality_checks

Pre-commit hooks

Code formatting pre-commit hooks are triggered on each commit

CI/CD

PR triggers CI Workflow

Environment setup, Unit tests, and Integration test

Best Flask open-source libraries and packages

Mlops loan prediction