登录 EN

添加临时用户

基于机器学习算法的虚开增值税发票风险识别问题研究

Research on The Risk Identification of Falsely Issuing VAT Invoices Based on Machine Learning Algorithm

作者:曾德敏
  • 学号
    2018******
  • 学位
    硕士
  • 电子邮箱
    dem******com
  • 答辩日期
    2022.12.09
  • 导师
    吴甦
  • 学科名
    工程管理
  • 页码
    65
  • 保密级别
    公开
  • 培养单位
    016 工业工程系
  • 中文关键词
    虚开增值税发票,机器学习,税收风险识别
  • 英文关键词
    Falsely issuing VAT invoices, Machine learning,Tax risk identification

摘要

虚开增值税发票的风险防控工作耗费了我国税务机关的大量征管资源,传统的税收风险分析方法存在风险来源经验化、风险模型碎片化、风险指标固定化等问题,虚开增值税发票案件频发导致税款流失,这成为长期困扰我国税务部门的顽疾。本文基于机器学习算法研究虚开增值税发票风险识别问题,旨在提高风险识别的准确度和效率。本文阐述了当前机器学习算法在税收风险识别领域的研究应用进展,结合税务专家经验总结纳税人虚开增值税发票的动因,并从团伙属性、企业类型、开票行为等三个维度总结虚开增值税发票纳税人的风险特征指标。从S市税务局业务系统采集和加工了纳税人登记、增值税纳税申报、增值税发票、关联自然人等四个方面特征指标,运用基于神经网络的多层感知机和基于决策树的XGBoost算法训练虚开增值税发票风险识别模型。从测试集上的预测效果看,多层感知机和XGboost模型的准确率、精确率、召回率接近于100%,AUC值都接近1,交叉熵也比较小。从效率方面看,XGboost模型在训练时间复杂度和参数调试方面都优于多层感知机模型。相对于传统的税务专家凭借经验人工判别方式,基于机器学习的虚开增值税发票风险识别模型在预测效果、工作效率、稳定性方面都遥遥领先。结合XGboost模型的可解释性,本文指出了在识别虚开增值税发票风险纳税人过程中一些重要性靠前的特征指标,包括纳税人所有税种近一年入库税款、历史绑定自然人在重点监控企业任职户次/总绑定户次、最晚开票日期与最后入库日期、历史绑定自然人在重点监控企业任职户次、历史绑定自然人为实名办税黑名单自然人人次之和、纳税人关联人员手机号码是虚拟号段等。最后,本论文对税收风险管理工作提出了深化运用实名办税、构建涉税数据体系底座、健全涉税数据治理机制、深化涉税数据分析应用等四方面的意见建议。

The risk identification of falsely issuing VAT invoices has consumed a large amount of collection and management resources of the tax authorities in China. The traditional tax risk analysis methods have problems such as limited sources of risk, Fragmented modeling,ossified indicators. The frequent cases of falsely issuing VAT invoices have led to the loss of tax revenue, which has become a persistent problem that has plagued the tax authorities in China for a long time. Based on machine learning algorithm, this paper studies the problem of false VAT invoice risk identification, aiming to improve the accuracy and efficiency of risk identification. This paper describes the research and application progress of current machine learning algorithms in the field of tax risk identification, summarizes the motivation of taxpayers to falsely issue VAT invoices based on the experience of tax experts, and summarizes the risk characteristic indicators of taxpayers who falsely issue VAT invoices from three dimensions, namely, group attribute, enterprise type, and invoicing behavior. 74 characteristic indicators of taxpayer registration, value-added tax declaration, value-added tax invoice and related natural persons were collected and processed from the tax database platforms of Shenzhen Taxation Bureau, XGBoost based on decision tree and multi-layer perceptron algorithm based on neural network are used to train the risk identification model of issuing false VAT invoice. From the perspective of the prediction performance on the test set, the accuracy, precision, recall and AUC values of the multi-layer perceptron and XGboost model are close to 100%, and the cross entropy of the model data distribution and the real data distribution is relatively small. In terms of efficiency, the XGbost model is superior to the multi-layer perceptron model in terms of training time complexity and parameter tuning. Compared with the traditional tax experts‘ manual judgment based on experience, the risk identification model of false VAT invoice based on machine learning is far ahead in terms of prediction performance, working efficiency and stability. Combined with the interpretability of the XGboost model, this paper points out some important characteristic indicators in the process of identifying taxpayers who are exposed to the risk of falsely issuing VAT invoices, including the taxpayer‘s receipt tax of all taxes in the past year, the number of historical bound natural persons in key risk monitoring enterprises or total bound accounts, the time difference between,the latest invoicing date and the last warehousing date, the number of historical bound natural persons in key risk monitoring enterprises, the number of historicall bound natural persons in the blacklist of real name tax handling, and the mobile phone number of the taxpayer‘s associated personnel to be found a virtual number segment. Finally, four suggestions were put forward for tax risk management of the tax authorities, including deepening the use of real name tax handling, building the base of tax related data system, improving the governance mechanism of tax related data, and deepening the analysis and application of tax related data.