本文使用了9种机器学习模型对中国A股市场进行资产定价。通过使用中国资本市场2000年至2018年的数据,本文根据已有文献构建了91个公司水平的定价因子;通过大规模的文本采集和情感分析,引入了分析师、新闻和社交媒体分别在公司、行业和市场上的9种情绪指标。总共通过使用989个定价因子,本文深入对比研究了9种机器学习模型在资产定价任务中的拟合优度、因子重要性、涨跌预测准确性和投资组合收益与风险。研究发现,线性支持向量回归、偏最小二乘回归和岭回归是三个表现最优模型;市值比、市盈率、行业动量、交易量和应计利润为最重要的五个定价因子;文本情感因子由于与已有财务指标定价因子的关联度较低,其引入可以提升模型模型表现。通过构建买高卖低的投资策略,最优的机器学习模型可以带来近3%的月化收益率。从赋能到使能,本文是使用机器学习模型在我国金融市场资产定价任务中的一次重要尝试与应用。
This paper attempts to use machine learning models to price assets in China's A-share market. By using data from the Chinese capital market from 2000 to 2018, this paper constructs 91 company-level pricing factors based on existing literature; through large-scale text collection and sentiment analysis, this paper extracts sentiment factors from analyst reports, news reports, and social media posts at stock, industry and market level. By using in total of 989 pricing factors, this paper compares of the fitness of nine machine learning models, the factors of importance, the accuracy of up and down predictions, and the risk and return of investment portfolios. This study finds that linear support vector regression, partial least squares regression and ridge regression are the top three best performing models; market value ratio, price-earnings ratio, industry momentum, trading volume, and accruals are the top five most important pricing factors; text sentiment factors have relatively low correlation with existing pricing factors and can significantly improve model performance. By building a simple long-short investment strategy, the machine learning model can bring in a monthly return of nearly 3%. Overall, this article is an important attempt of using machine learning models in the asset pricing of China's financial markets.