登录 EN

添加临时用户

小额贷款公司的信用风险模型研究

Research on the Models for the Credit Risk of Small Loan Company

作者:陈萍
  • 学号
    2015******
  • 学位
    硕士
  • 电子邮箱
    che******com
  • 答辩日期
    2017.06.03
  • 导师
    杨瑛
  • 学科名
    应用统计
  • 页码
    59
  • 保密级别
    公开
  • 培养单位
    042 数学系
  • 中文关键词
    小额贷款,信用风险模型,逻辑回归,梯度迭代决策树
  • 英文关键词
    small loan,credit risk model,logistic model,GBDT model

摘要

近年来,我国的互联网金融行业蓬勃发展。然而在追逐高额利润的同时,也要重视其潜在的巨大违约风险,尤其是借款者的个人信用风险。本文对小额贷款公司的信用风险模型进行理论和实证研究。本文在文献中首次利用机器学习中的梯度迭代决策树(GBDT)算法建立信用风险模型。 首先,论文分析了小额贷款公司的风险构成,并且对各种信贷风险评价方法进行逐一介绍和评析,这是理论层面的研究。研究可知,个人信用风险是小额贷款公司面临的最重要的风险。本文将构建模型用到的指标-变量稳定性指标、预测力指标、拟合效果指标进行逐一介绍。 之后,利用佰仟金融和银联数据来开发信用风险模型,模型用来估计借款者的个人信用风险的违约率,即业务中所说的信用分,违约率越高,信用分越低,反之则高,这是实证层面的分析。模型包括传统的逻辑回归(logistic)模型和梯度迭代决策树(GBDT)模型,详细描述信用风险模型开发的步骤,从问题准备、数据获取与变量构造、探索性数据分析(EDA)和数据描述、数据准备、变量选择、模型开发、模型检验和评价,最后构建起具有客观性、科学性的信用风险模型。 最后对两种信用风险模型进行对比以及策略应用。两个模型最终得出的违约率分布有相似之处,违约率集中在0.3%-8.9%,违约率8.9%以上的客群占比都为16%。为了控制风险,用这两个模型都可以筛选出违约率在8.9%以上的客群。但这两种信用风险模型也有各自的优缺点,从KS图可以看出,logistic模型对好客户和坏客户的区分能力比GBDT模型要好;logistic模型相比GBDT模型不容易造成过拟合;而且,GBDT模型内部的迭代过程处于暗箱,我们无法了解暗箱中是如何进行迭代操作,相比而言,logistic模型就简明易懂。当然,logistic模型与GBDT模型相比也有很多不足之处,因为logistic模型的简单性,它不能较好拟合极端值,有时拟合效果比GBDT模型差。而且在变量筛选时,logistic模型需要大量人工干预的步骤。 因此现在的业务还是较多采纳logistic模型这种简单明了的模型来进行风险的度量。但是随着对机器学习的普及化,现代企业还是会将机器学习引入到信用风险评估中,形成互补。模型技术提高的时刻也是风险领域展开新篇章的时刻。

Recently, the internet finance industry has developed rapidly. However, the small-loan company must concern the credit risk of the lenders. We do the research both on the theory and practice for the credit risk models of the small-loan company. Firstly, we analyze the risk composition of the small-loan company, introduce some evaluation methodologies of the credit risk. At last, they all get the conclusion that personal credit risk is the most important risk of the small-loan company. We introduce the variable stability index, prediction ability index, fitting effect index. Secondly, we develop the credit risk models by using the data of the Bai Qian Finance and Unionpay, which is the empirical research. We use the logistic model and the GBDT model to construct the credit risk models. The development of models contains many complicate steps, such as the question preparation, data acquisition & variable construction, exploratory data analysis, variable selection, model development, model evaluation. We gradually construct object and scientific models. At last, we contrast the logistic risk model and GBDT risk model and formulate debit and credit strategy. There are some similarities in the distribution of default rates resulting from these two models. The default rates of both models are concentrated in 0.3% -8.9% and the customers whose default rates are higher than 8.9% account for 16% of the total population. We can select these customers to control risk. The two credit risk models both have advantages and disadvantages. We can find that the distinction ability about good and bad customers of the logistic model is better than that of the GBDT model from the KS pictures. The GBDT model is easier to cause overfitting than the logistic model. Moreover, the iterative process of the GBDT model is in the black box, which we can’t understand the process clearly. Of course, the GBDT model has some advantages, the fitting effect of the logistic model is worse than that of the GBDT. The logistic model needs to do a lot of human interventions in the step of variable selection. As the popularization of the machine learning, the modern enterprises have the trend to utilize the machine learning technique on the development of credit risk models. It can make up the shortage of the traditional risk models.