Can I Grade Loans Better Than LendingClub?

Pitting My Neural Network Against a Corporate Benchmark

September 23, 2020
(last updated October 23, 2020)

Introduction
Ground rules
Test metric
LendingClub’s turn
My turn
Victory!
Further reading

Introduction

In case you missed it, I built a neural network to predict loan risk using a public dataset from LendingClub. Then I built a public API to serve the model’s predictions. That’s nice and all, but… how good is my model?

Today I’m going to put it to the test, pitting it against the risk models of the very institution who issued those loans. That’s right, LendingClub included their own calculated loan grades (and sub-grades) in the dataset, so all the pieces are in place for the most thrilling risk modeling smackdown of the ~~century~~ week. May the best algorithm win!

import joblib

prev_notebook_folder = "../input/building-a-neural-network-to-predict-loan-risk/"
loans = joblib.load(prev_notebook_folder + "loans_for_eval.joblib")
loans.shape

(1110171, 70)

loans.head()

	loan_amnt	term	emp_length	home_ownership	annual_inc	purpose	dti	delinq_2yrs	cr_hist_age_mths	fico_range_low	...	tot_hi_cred_lim	total_bal_ex_mort	total_bc_limit	total_il_high_credit_limit	fraction_recovered	issue_d	grade	sub_grade	expected_return
0	3600.0	36 months	10+ years	MORTGAGE	55000.0	debt_consolidation	5.91	0.0	148.0	675.0	...	178050.0	7746.0	2400.0	13734.0	1.0	Dec-2015	C	C4	4429.08
1	24700.0	36 months	10+ years	MORTGAGE	65000.0	small_business	16.06	1.0	192.0	715.0	...	314017.0	39475.0	79300.0	24667.0	1.0	Dec-2015	C	C1	29530.08
2	20000.0	60 months	10+ years	MORTGAGE	63000.0	home_improvement	10.78	0.0	184.0	695.0	...	218418.0	18696.0	6200.0	14877.0	1.0	Dec-2015	B	B4	25959.60
4	10400.0	60 months	3 years	MORTGAGE	104433.0	major_purchase	25.37	1.0	210.0	695.0	...	439570.0	95768.0	20300.0	88097.0	1.0	Dec-2015	F	F1	17394.60
5	11950.0	36 months	4 years	RENT	34000.0	debt_consolidation	10.20	0.0	338.0	690.0	...	16900.0	12798.0	9400.0	4000.0	1.0	Dec-2015	C	C3	14586.48

5 rows × 70 columns

This post was adapted from a Jupyter Notebook, by the way, so if you’d like to follow along in your own notebook, go ahead and fork mine Kaggle or GitHub!

Ground rules

This is going to be a clean fight—my model won’t use any data LendingClub wouldn’t have access to at the point they calculate a loan’s grade (including the grade itself).

I’m going to sort the dataset chronologically (using the issue_d column, the month and year the loan was issued) and split it into two parts. The first 80% I’ll use for training my competition model, and I’ll compare performance on the last 20%.

from sklearn.model_selection import train_test_split

loans["date"] = loans["issue_d"].astype("datetime64[ns]")
loans.sort_values("date", axis="index", inplace=True, kind="mergesort")

train, test = train_test_split(loans, test_size=0.2, shuffle=False)
train, test = train.copy(), test.copy()
print(f"The test set contains {len(test):,} loans.")

The test set contains 222,035 loans.

At the earlier end of the test set my model may have a slight informational advantage, having been trained on a few loans that may not have closed yet at the point LendingClub was grading those ones. On the other hand, LendingClub may have a slight informational advantage on the later end of the test set, since they would have known the outcomes of some loans on the earlier end of the test set by that time.

I have to give credit to Michael Wurm, by the way, for the idea of comparing my model’s performance to LendingClub’s loan grades, but my approach is pretty different. I’m not trying to simulate the performance of an investment portfolio; I’m just evaluating how well my predictions of simple risk compare.

Test metric

The test: who can pick the best set of grade A loans, judged on the basis of the independent variable from my last notebook, the fraction of an expected loan return that a prospective borrower will pay back (which I engineered as fraction_recovered).

LendingClub will take the plate first. I’ll gather all their grade A loans from the test set, count them, and calculate their average fraction_recovered. That average will be the metric my model has to beat.

Then I’ll train my model on the training set using the same pipeline and parameters I settled on in my last notebook. Once it’s trained, I’ll use it to make predictions on the test set, then gather the number of top predictions equal to the number of LendingClub’s grade A loans. Finally, I’ll calculate the same average of fraction_recovered on that subset, and we’ll have ourselves a winner!

LendingClub's turn

from statistics import mean

lc_grade_a = test[test["grade"] == "A"]
print(f"LendingClub gave {len(lc_grade_a):,} loans in the test set an A grade.")

print("\nAverage `fraction_recovered` on LendingClub's grade A loans:")
print(round(mean(lc_grade_a["fraction_recovered"]), 5))

LendingClub gave 38,779 loans in the test set an A grade.

Average `fraction_recovered` on LendingClub's grade A loans:
0.96021

That’s a pretty high percentage. I’m a bit nervous.

My turn

First, I’ll copy over my run_pipeline function from my previous notebook:

from sklearn.model_selection import train_test_split
from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder, StandardScaler
from tensorflow.keras import Sequential, Input
from tensorflow.keras.layers import Dense, Dropout


def run_pipeline(
    data,
    onehot_cols,
    ordinal_cols,
    batch_size,
    validate=True,
):
    X = data.drop(columns=["fraction_recovered"])
    y = data["fraction_recovered"]
    X_train, X_valid, y_train, y_valid = (
        train_test_split(X, y, test_size=0.2, random_state=0)
        if validate
        else (X, None, y, None)
    )

    transformer = DataFrameMapper(
        [
            (onehot_cols, OneHotEncoder(drop="if_binary")),
            (
                list(ordinal_cols.keys()),
                OrdinalEncoder(categories=list(ordinal_cols.values())),
            ),
        ],
        default=StandardScaler(),
    )

    X_train = transformer.fit_transform(X_train)
    X_valid = transformer.transform(X_valid) if validate else None

    input_nodes = X_train.shape[1]
    output_nodes = 1

    model = Sequential()
    model.add(Input((input_nodes,)))
    model.add(Dense(64, activation="relu"))
    model.add(Dropout(0.3, seed=0))
    model.add(Dense(32, activation="relu"))
    model.add(Dropout(0.3, seed=1))
    model.add(Dense(16, activation="relu"))
    model.add(Dropout(0.3, seed=2))
    model.add(Dense(output_nodes))
    model.compile(optimizer="adam", loss="mean_squared_logarithmic_error")

    history = model.fit(
        X_train,
        y_train,
        batch_size=batch_size,
        epochs=100,
        validation_data=(X_valid, y_valid) if validate else None,
        verbose=2,
    )

    return history.history, model, transformer


onehot_cols = ["term", "application_type", "home_ownership", "purpose"]
ordinal_cols = {
    "emp_length": [
        "< 1 year",
        "1 year",
        "2 years",
        "3 years",
        "4 years",
        "5 years",
        "6 years",
        "7 years",
        "8 years",
        "9 years",
        "10+ years",
    ]
}

Now for the moment of truth:

# Train the model
_, model, transformer = run_pipeline(
    train.drop(columns=["issue_d", "date", "grade", "sub_grade", "expected_return"]),
    onehot_cols,
    ordinal_cols,
    batch_size=128,
    validate=False,
)

# Make predictions
X_test = transformer.transform(
    test.drop(
        columns=[
            "fraction_recovered",
            "issue_d",
            "date",
            "grade",
            "sub_grade",
            "expected_return",
        ]
    )
)
test["model_predictions"] = model.predict(X_test)

# Gather top predictions
test_sorted = test.sort_values("model_predictions", axis="index", ascending=False)
ty_grade_a = test_sorted.iloc[0:len(lc_grade_a)]

# Display results
print("\nAverage `fraction_recovered` on Ty's grade A loans:")
print(format(mean(ty_grade_a["fraction_recovered"]), ".5f"))

Epoch 1/100
6939/6939 - 13s - loss: 0.0249
Epoch 2/100
6939/6939 - 13s - loss: 0.0204
Epoch 3/100
6939/6939 - 13s - loss: 0.0202
Epoch 4/100
6939/6939 - 13s - loss: 0.0202
Epoch 5/100
6939/6939 - 13s - loss: 0.0202
Epoch 6/100
6939/6939 - 14s - loss: 0.0201
Epoch 7/100
6939/6939 - 14s - loss: 0.0201
Epoch 8/100
6939/6939 - 14s - loss: 0.0201
Epoch 9/100
6939/6939 - 13s - loss: 0.0201
Epoch 10/100
6939/6939 - 12s - loss: 0.0201
Epoch 11/100
6939/6939 - 13s - loss: 0.0201
Epoch 12/100
6939/6939 - 13s - loss: 0.0201
Epoch 13/100
6939/6939 - 13s - loss: 0.0201
Epoch 14/100
6939/6939 - 13s - loss: 0.0201
Epoch 15/100
6939/6939 - 12s - loss: 0.0201
Epoch 16/100
6939/6939 - 12s - loss: 0.0201
Epoch 17/100
6939/6939 - 13s - loss: 0.0200
Epoch 18/100
6939/6939 - 13s - loss: 0.0200
Epoch 19/100
6939/6939 - 13s - loss: 0.0200
Epoch 20/100
6939/6939 - 14s - loss: 0.0200
Epoch 21/100
6939/6939 - 13s - loss: 0.0200
Epoch 22/100
6939/6939 - 13s - loss: 0.0200
Epoch 23/100
6939/6939 - 12s - loss: 0.0200
Epoch 24/100
6939/6939 - 12s - loss: 0.0200
Epoch 25/100
6939/6939 - 12s - loss: 0.0200
Epoch 26/100
6939/6939 - 13s - loss: 0.0200
Epoch 27/100
6939/6939 - 13s - loss: 0.0200
Epoch 28/100
6939/6939 - 13s - loss: 0.0200
Epoch 29/100
6939/6939 - 13s - loss: 0.0200
Epoch 30/100
6939/6939 - 13s - loss: 0.0200
Epoch 31/100
6939/6939 - 15s - loss: 0.0200
Epoch 32/100
6939/6939 - 13s - loss: 0.0200
Epoch 33/100
6939/6939 - 12s - loss: 0.0200
Epoch 34/100
6939/6939 - 13s - loss: 0.0200
Epoch 35/100
6939/6939 - 13s - loss: 0.0200
Epoch 36/100
6939/6939 - 13s - loss: 0.0200
Epoch 37/100
6939/6939 - 13s - loss: 0.0200
Epoch 38/100
6939/6939 - 13s - loss: 0.0200
Epoch 39/100
6939/6939 - 13s - loss: 0.0200
Epoch 40/100
6939/6939 - 13s - loss: 0.0200
Epoch 41/100
6939/6939 - 13s - loss: 0.0200
Epoch 42/100
6939/6939 - 13s - loss: 0.0200
Epoch 43/100
6939/6939 - 14s - loss: 0.0200
Epoch 44/100
6939/6939 - 13s - loss: 0.0200
Epoch 45/100
6939/6939 - 13s - loss: 0.0200
Epoch 46/100
6939/6939 - 13s - loss: 0.0200
Epoch 47/100
6939/6939 - 13s - loss: 0.0200
Epoch 48/100
6939/6939 - 13s - loss: 0.0200
Epoch 49/100
6939/6939 - 13s - loss: 0.0200
Epoch 50/100
6939/6939 - 13s - loss: 0.0200
Epoch 51/100
6939/6939 - 13s - loss: 0.0200
Epoch 52/100
6939/6939 - 13s - loss: 0.0200
Epoch 53/100
6939/6939 - 13s - loss: 0.0200
Epoch 54/100
6939/6939 - 14s - loss: 0.0200
Epoch 55/100
6939/6939 - 14s - loss: 0.0200
Epoch 56/100
6939/6939 - 13s - loss: 0.0200
Epoch 57/100
6939/6939 - 13s - loss: 0.0200
Epoch 58/100
6939/6939 - 13s - loss: 0.0200
Epoch 59/100
6939/6939 - 13s - loss: 0.0200
Epoch 60/100
6939/6939 - 13s - loss: 0.0200
Epoch 61/100
6939/6939 - 13s - loss: 0.0200
Epoch 62/100
6939/6939 - 13s - loss: 0.0200
Epoch 63/100
6939/6939 - 13s - loss: 0.0200
Epoch 64/100
6939/6939 - 13s - loss: 0.0200
Epoch 65/100
6939/6939 - 12s - loss: 0.0200
Epoch 66/100
6939/6939 - 13s - loss: 0.0200
Epoch 67/100
6939/6939 - 14s - loss: 0.0200
Epoch 68/100
6939/6939 - 13s - loss: 0.0200
Epoch 69/100
6939/6939 - 13s - loss: 0.0200
Epoch 70/100
6939/6939 - 13s - loss: 0.0200
Epoch 71/100
6939/6939 - 13s - loss: 0.0200
Epoch 72/100
6939/6939 - 13s - loss: 0.0200
Epoch 73/100
6939/6939 - 13s - loss: 0.0200
Epoch 74/100
6939/6939 - 13s - loss: 0.0200
Epoch 75/100
6939/6939 - 13s - loss: 0.0200
Epoch 76/100
6939/6939 - 13s - loss: 0.0200
Epoch 77/100
6939/6939 - 13s - loss: 0.0200
Epoch 78/100
6939/6939 - 13s - loss: 0.0200
Epoch 79/100
6939/6939 - 14s - loss: 0.0200
Epoch 80/100
6939/6939 - 13s - loss: 0.0200
Epoch 81/100
6939/6939 - 13s - loss: 0.0200
Epoch 82/100
6939/6939 - 13s - loss: 0.0200
Epoch 83/100
6939/6939 - 13s - loss: 0.0200
Epoch 84/100
6939/6939 - 12s - loss: 0.0200
Epoch 85/100
6939/6939 - 13s - loss: 0.0200
Epoch 86/100
6939/6939 - 13s - loss: 0.0200
Epoch 87/100
6939/6939 - 13s - loss: 0.0200
Epoch 88/100
6939/6939 - 13s - loss: 0.0200
Epoch 89/100
6939/6939 - 13s - loss: 0.0200
Epoch 90/100
6939/6939 - 13s - loss: 0.0200
Epoch 91/100
6939/6939 - 14s - loss: 0.0200
Epoch 92/100
6939/6939 - 13s - loss: 0.0200
Epoch 93/100
6939/6939 - 13s - loss: 0.0200
Epoch 94/100
6939/6939 - 13s - loss: 0.0200
Epoch 95/100
6939/6939 - 13s - loss: 0.0200
Epoch 96/100
6939/6939 - 13s - loss: 0.0200
Epoch 97/100
6939/6939 - 13s - loss: 0.0200
Epoch 98/100
6939/6939 - 13s - loss: 0.0200
Epoch 99/100
6939/6939 - 13s - loss: 0.0200
Epoch 100/100
6939/6939 - 13s - loss: 0.0200

Average `fraction_recovered` on Ty's grade A loans:
0.96166

Victory!

Phew, that was a close one! My win might be too small to be statistically significant, but hey, it’s cool seeing that I can keep up with LendingClub’s best and brightest.