- My presentations
Auth with social network:
Download presentation
We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
LENDING CLUB LOAN ANALYSIS
Published by Shavonne Austin Modified over 6 years ago
Similar presentations
Presentation on theme: "LENDING CLUB LOAN ANALYSIS"— Presentation transcript:
Brief introduction on Logistic Regression
Chapter 8 – Logistic Regression
Correlation and regression Dr. Ghada Abo-Zaid
Mathematics SL Internal Assessment
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
x – independent variable (input)
Simple Linear Regression
Final Project: Project 9 Part 1: Neural Networks Part 2: Overview of Classifiers Aparna S. Varde April 28, 2005 CS539: Machine Learning Course Instructor:
Lecture 24: Thurs., April 8th
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Slide 1 Testing Multivariate Assumptions The multivariate statistical techniques which we will cover in this class require one or more the following assumptions.
Decision Tree Models in Data Mining
AUDIT PROCEDURES. Commonly used Audit Procedures Analytical Procedures Analytical Procedures Basic Audit Approaches - Basic Audit Approaches - System.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
April 11, 2008 Data Mining Competition 2008 The 4 th Annual Business Intelligence Symposium Hualin Wang Manager of Advanced.
Data Mining Techniques
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
About project
© 2024 SlidePlayer.com Inc. All rights reserved.
End to End Case Study (Classification): Lending Club data
Pawan Reddy Ulindala
Towards Data Science
Lending Club is a lending platform that lends money to people in need at an interest rate based on their credit history and other factors. In this blog, we will analyze this data and pre-process it based on our need and build a machine learning model that can identify a potential defaulter based on his/her history of transactions with Lending Club. You can find the data here .
This dataset contains 42538 rows and 144 columns. Out of these 144 columns, many columns have null values in majority.
In fact, 63.15% of the values in the overall data are null values. So, it is very important to carefully deal with these null values as they can significantly affect our results.
Null values visual plot:
Handling null values:
Handling null values is an important task here. In the below code, you can see that there are only 53 columns out of 144 columns that have null values less than 40 percent.
In the above table, each row represents the number of columns out of 144 columns with less than a specific percentage of null values. For example, Row 1 represents that there are 52 columns with less than 10% of null values in each column.
We were able to decrease the total number of columns from 144 to 53 by considering columns with less than 40% of null values.
Understanding Features
It is important to understand the features/columns as some of the categorical columns present in the data are in the form of numerical values and vice-versa. I first tried to examine every column but later understood that it will be quite cumbersome to perform these operations to all 53 columns. So, I decided to first eliminate columns which doesn’t add value to the data and then analyze each field.
Checking objects:
Dropping unnecessary objects:
Checking numerical columns:
Dropping unnecessary numerical columns:
After examining the data, we have dropped a total of 18 columns of these 53 that didn’t add value to our data. We were able to decrease number of columns from 53 to 35 and we will still try to decrease the no.of columns.
Converting categorical columns to numerical columns:
We have converted categorical columns to numerical by either performing one-hot encoding or label encoding depending on the kind of data they represent. For example, one hot encoding was performed on [‘home_ownership’,’verification_status’,’purpose’] columns whereas label encoding was performed on ‘grade’ and ‘sub grade’ columns as they are ordinal in nature.
One hot encoding:
Label encoding:
Updating the grade column with label encoded values:
Converting DateTime columns to numerical columns:
The columns [‘issue_d’,’last_pymnt_d’,’last_credit_pull_d’] which are datetime columns are further divided into month and year by using pandas datetime module. The new columns are named as ‘issue_d_year’, ‘issue_d_month’, ‘last_pymnt_d_year’, ‘last_pymnt_d_month’, ‘last_credit_pull_d_year’, ‘last_credit_pull_d_month’ respectively.
Converting objects to numerical columns:
The columns int_rate and term are stored as objects. We have performed necessary string operations to convert them into numerical columns.
Checking correlation: Now that we have converted all the columns to numerical columns, we will check for correlation.
There are few columns with high correlation but these columns haven’t been considered while solving our questions. For example, when trying to classify if the loan will be paid back by the customer we will not consider any future transactions like total_pymnt and total_pymnt_inv . Hence, these columns aren’t dropped here.
Dealing with null values:
Let’s check if there are any null values after significantly cleaning columns.
As we can see, we can still find some null values in the data. We will examine these null values and take the necessary actions.
Let’s check the columns with which have the highest percentage of null values.
Some columns have a very little percentage of null values(less than 1%). There we can replace the null values with the median of their respective columns.
For columns that have a high percentage of null values, we will run a model on top of non-null values and predict the missing values in that respective column.
As there are no null values, we will go to the next step i.e., building a machine learning model.
Classification
The goal of our classification task is to identify whether a customer(who is requesting a loan) will be able to repay the loan along with the interest amount. Since we have some columns that contain the information of future transactions w.r.t the date on which the loan is taken(like paying monthly loan installments after taking the loan etc.), we will drop them from the pre-processed dataset to carry on classification tasks.
The goal of our classification task is to identify whether a customer(who is requesting a loan) will default based on his historic transactions with the lender after taking the loan.
Let’s drop few columns that contain information on charged-off loans. Columns dropped in classification: [‘total_pymnt’,’total_pymnt_inv’,’total_rec_prncp’,’total_rec_int’,’total_rec_late_fee’,’recoveries’]
The ‘loan_status’ column is used as a target variable to classify a customer based on his records.‘loan_status’ column has 4 unique values all of which are label encoded for ease of representation. This column is labeled as below:
Checking multicollinearity between features using VIF and then dropping columns with a value higher than the threshold.
We have dropped the columns which have a high VIF factor(10 or above).
Model building
After label encoding the target variable, we have split the data to train and test data in the ratio of 70:30.
We used sklearn’s cross_val_score and grid search cv with scoring as f1 score to examine the performance of each model in each fold. The below figure shows the F1 score of each model in 3 folds. The orange line represents the mean F1 score of each model whereas IQR represents the variance of these scores.
From the above figure, we can say that the bagging classifier is the most stable model with the highest mean of weighted F1 scores and least variance.
Building the model using bagging classifier.
The classification report for bagging classifier
This model has an accuracy of 0.89 and an average F1 score of 0.75.
Final Confusion matrix:
As the bagging Classifier doesn’t have an option for feature importance, we used a decision tree to find feature importances.
Conclusion :
In this blog, we have extensively covered pre-processing steps required for this data and then found the best fit model using Grid search and KFolds. I hope that this blog has given you an overall picture of solving a classification problem. For more detailed code, please refer to https://github.com/pawanreddy-u/lendingclub9
Written by Pawan Reddy Ulindala
Learn to write and write to learn
Text to speech
Assignment : Lending Club Case Study
Team : Vijay Garg and Santhosh ankam
Date : 14 Feb 2021
Business Understanding
We are working for Lending club a finance company which specialises in lending various types of loans to urban customers. When the company receives a loan application, the company has to make a decision for loan approval based on the applicant’s profile. Two types of risks are associated with the bank’s decision: •If the applicant is likely to repay the loan, then not approving the loan results in a loss of business to the company •If the applicant is not likely to repay the loan, i.e. he/she is likely to default, then approving the loan may lead to a financial loss for the company
The company wants to understand the driving factors (or driver variables) behind loan default (loan_status = 'Charged Off'), i.e. the variables which are strong indicators of default. The company can utilise this knowledge for its portfolio and risk assessment.
Import the necessary libraries
You should enable JavaScript to work with this page.
We tried to load scripts but something went wrong.
Please make sure that your network settings allow you to download scripts from the following domain:
https://id-frontend.prod-east.frontend.public.atl-paas.net
Navigation Menu
Search code, repositories, users, issues, pull requests..., provide feedback.
We read every piece of feedback, and take your input very seriously.
Saved searches
Use saved searches to filter your results more quickly.
To see all available qualifiers, see our documentation .
- Notifications You must be signed in to change notification settings
IMAGES
VIDEO
COMMENTS
This document analyzes lending club loan data to predict loan defaults and calculate default probabilities using models like gradient boosting, neural networks, and logistic regression. The goal is to make informed decisions about future loans to assess profitability. Various machine learning models are trained and tested on the data, with ...
General Information. As a part of a Consumer Lending Finance Company, which specialises in lending various types of loans, we need to identify the patterns which indicates if a loan is likely to Default. When the company receives a loan application, it has to make a decision for loan approval based on applicant's profile.
Background -Lending Club Case Study Background Lending club is the largest peer-to-peer marketplace connecting borrowers with lenders. Borrowers apply through an online platform where they are assigned an internal score. Lenders decide 1) whether to lend and 2) the terms of loan such as interest rate, monthly instalment, tenure etc.
Lending Club Case Study. Goals of data analysis: Lending loans to 'risky' applicants is the largest source of financial loss (called credit loss). The credit loss is the amount of money lost by the lender when the borrower refusesto pay or runs away with the money owed.
Lending Club Case Study. This project involves a comprehensive Exploratory Data Analysis (EDA) of the Lending Club dataset with the objective of uncovering insights into how various consumer and loan attributes influence the tendency of borrowers to default. Lending Club, a peer-to-peer lending platform, provides a rich dataset encompassing ...
This document analyzes lending club loan data to predict loan defaults and calculate default probabilities using models like gradient boosting, neural networks, and logistic regression. The goal is to make informed decisions about future loans to assess profitability. Various machine learning models are trained and tested on the data, with ...
LENDING CLUB ANALYSIS OVERVIEW Identifying the Business Problem Data Description Data Preparation & Processing Data Mining Models Logistic Regression Decision Trees K-Nearest Neighbor Neural Networks Summary of Findings Conclusion During this presentation we will move systematically through a discussion of our project. First we will give an overview of Lending Club model and the business ...
Lending Club Case Study - Free download as Powerpoint Presentation (.ppt / .pptx), PDF File (.pdf), Text File (.txt) or view presentation slides online. Scribd is the world's largest social reading and publishing site.
Jun 24, 2014 • Download as PPTX, PDF •. This document discusses how peer-to-peer lending platforms like Lending Club are transforming banking by allowing individuals to directly invest in loans to borrowers. It outlines benefits for both borrowers and investors, such as lower rates and returns. While concerns about safety exist, these ...
Photo by Avinash Kumar on Unsplash. Lending Club is a lending platform that lends money to people in need at an interest rate based on their credit history and other factors.In this blog, we will analyze this data and pre-process it based on our need and build a machine learning model that can identify a potential defaulter based on his/her history of transactions with Lending Club.
Marketplace bank delivers the best of both worlds, driving significant growth and profitability. Best-in-class Consumer Lending Platform - cycle tested and with a significant data advantage. Embedded 3M+ loyal member customer base, with 50% repeat borrowers. Large TAM in one of the fastest growing areas in financial services.
Business Understanding. We are working for Lending club a finance company which specialises in lending various types of loans to urban customers. When the company receives a loan application, the company has to make a decision for loan approval based on the applicant's profile. Two types of risks are associated with the bank's decision:
Lending Club Case Study You work for a consumer finance company which specialises in lending various types of loans to urban customers. When the company receives a loan application, the company has to make a decision for loan approval based on the applicant's profile.
LendingClub - Free download as Powerpoint Presentation (.ppt / .pptx), PDF File (.pdf), Text File (.txt) or view presentation slides online. 1) Foundation Capital, a venture capital firm, was considering increasing its 10.3% ownership stake in Lending Club, the leading peer-to-peer lending platform. 2) The document performed a top-down analysis of the large and growing US consumer credit and ...
LendingClub_CaseStudy - Free download as PDF File (.pdf), Text File (.txt) or read online for free. 1) The document discusses an EDA case study performed by Lending Club to understand the key drivers of loan defaults. 2) The analysis found that higher interest rates, loan amounts over 30% of annual income, revolving line utilization over 75%, prior bad records, and debt to income ratios over ...
Case Study: Lending Club _____ Note: This memorandum was prepared by Anooshree C. Sinha. LL.M. '09, Harvard Law School, and Corinne Snow J.D. '112, Harvard Law School, under the supervision of Professor Howell E. Jackson of Harvard Law School. The memorandum is intended solely for educational purposes and does not represent an opinion of ...
Lending Club Case Study. Assigment by Upgrad and IIIT-B. The case study focuses on EDA mainly, to understand which parameters are major to detect whether a customer will default loan or not. Pesented a PPT to illustrate the major parameter to consider while giving loans along their data distributions. Contributors:
Case Study: Lending Club 1 minute read Problem Statement. A consumer finance company specialises in lending various types of loans to urban customers. When the company receives a loan application, it has to make a decision for loan approval based on the applicant's profile. Two types of risks are associated with the bank's decision:
View Lending Club Case Study Live Session.pdf from CS 401 at ShriRam College of Engineering & Management. #LifeKoKaroLift Lending Club Case Study: Pre-Assignment Session 1 Course : ML/AI Edit Master ... This is need to be done for both PPT and the Jupyter Notebook 13 Lending Club: EDA Case Study .
Lending Club follows the path of founder and CEO Renaud Laplanche as he scales his successful P2P lending company both pre- and post-IPO. From debating with bankers on the proper valuation metrics for the company, to managing customer acquisition costs as the competitive landscape rapidly changes, the Lending Club case explores several key challenges that come with operating a fin-tech company ...
This company is the largest online loan marketplace, facilitating personal loans, business loans, and financing of medical procedures.Borrowers can easily access lower interest rate loans through a fast online interface. Like most other lending companies, lending loans to 'risky' applicants is the largest source of financial loss (called credit loss).
Explore and run machine learning code with Kaggle Notebooks | Using data from Lending Club Case study Kaggle uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic.
Lending Club - EDA. Contribute to Pragyan-Choudhury/Lending_Club_Case_Study_PPT development by creating an account on GitHub.