房地产是国民经济的支柱产业之一,如何对房屋的价值进行准确评估自然受到学界的广泛研究,房屋估价在现实中也有广泛的应用场景。目前学界对房屋估价的方法主要有两类:一是线性回归,这种方法可以很好地挖掘对房价有显著影响的特征,但是准确性较低;二是机器学习,这种方法可以极大提升结果的准确性,但不少复杂的算法在可解释性上较为欠缺。针对这样的现状,本文创新地采用以历史成交记录价格为基准、房屋个性特征调整的思路,建立了一种既有强解释性、又一定程度上兼顾准确性的房屋估价模型。本文以链家网的成交数据为研究对象,选取了该网站2020年北京市12万余条二手房交易记录,并结合人口、经济、交通、教育和商业等方面的空间分布数据建立模型。本研究首先对数据进行了清洗和整合;通过不同的特征筛选方法筛选了与房屋价格相关性较强的特征。接着,本文建立了一种两阶段的二手房估价模型:利用周边房屋的历史成交记录确定待估价房屋的基准价格;在此基础上根据房屋的自身属性、周边设施分布等个性因素进一步调整,最终确定价格。最后,本研究对模型的可解释性和准确性进行评估,并对模型进行了改进。在可解释性上,本文模型相比其他典型的可解释模型(线性回归、决策树)更简洁明了,易被大众理解和接受。在模型的准确性上,本文模型的效果远好于作为基准的回归模型(包括线性回归和Lasso回归),略优于简单的机器学习模型(如决策树),但和较复杂的集成模型(如XGboost)相比仍有一定差距。本研究有一定的理论和实践意义。一方面,本文创新地提出了一种有别于其他学术研究常用方法的房地产估价模型,并取得了较好的结果,为学术研究提供了新思路;另外一方面,本文建立的模型具有解释性强的特点,在二手房交易、征地拆迁、税费征收、房屋抵押贷款等现实场景中可以有效减少争议、促进交易,有较好的应用前景。
House valuation is a vital issue for real estate industry, and it has received extensive attention from scholars. The existing methods can be divided into two main categories: linear regression models, which can effectively identify different features that influence house prices, but have low accuracy; and machine learning models, which can greatly enhance the accuracy, but may lack interpretability. To address this issue, this paper proposes a novel method that uses historical transaction records as a benchmark and house personality features as adjustment factors. Result shows that the model balances accuracy and interpretability well. This paper takes housing transaction records from Lianjia.com as research samples. More than 120,000 second-hand house transaction records in Beijing in 2020, combined with the spatial distribution data of population, economy, transportation, education and commerce are used to train and test model. This study first cleans and integrates the data, selects features that have strong correlation with house prices using different feature selection methods. Then, this paper establishes a two-stage model for valuing second-hand houses: the historical transaction records of neighboring houses are used to determine the benchmark price of the house to be valued; Based on the benchmark, the price is further adjusted according to the house‘s own attributes, the distribution of neighboring facilities and other personality factors. Finally, this study evaluates the interpretability and accuracy of the model. In terms of interpretability, this paper is more concise and clear compared with typical interpretable models (linear regression, decision tree), and can be more easily accepted and understood by the public. In terms of accuracy, the model in this paper is much better than regression models (including linear regression and Lasso regression) used as comparative benchmarks, and slightly better than simple machine learning models (e.g., decision trees), but still falls short of more complex integrated models (e.g., XGboost).This study has theoretical and practical implications. It develops a novel house valuation model that differs from methods commonly used in academic research, offering a new perspective for scholars. The model also has high interpretability and can reduce disputes and facilitate transactions in various real scenarios such as second-hand housing, taxation, land acquisition and demolition, housing mortgage loans, etc.