登录 EN

添加临时用户

随机化试验中基于设计的因果推断方法及其理论

Design-based Causal Inference in Randomized Experiments: Methods and Theory

作者:朱珂
  • 学号
    2017******
  • 学位
    博士
  • 电子邮箱
    zhu******.cn
  • 答辩日期
    2023.05.18
  • 导师
    刘汉中
  • 学科名
    统计学
  • 页码
    114
  • 保密级别
    公开
  • 培养单位
    016 工业工程系
  • 中文关键词
    重随机化,分层随机化,Fisher随机化检验,Lasso,因果推断
  • 英文关键词
    causal inference, Fisher randomization tests, Lasso, rerandomization, stratified randomization

摘要

探究事物之间的因果关系是众多自然科学与社会科学研究的终极目标。作为评估因果效应的“金标准”,随机化试验被广泛应用于经济学、政治学、生物医学、流行病学等领域。高效地设计和分析随机化试验,从而对因果效应进行估计和推断,是统计学家尤为关心的问题。完全随机化试验虽然在平均意义上可以平衡所有协变量,但是在某次分配中,仍然有很大概率观测到协变量不平衡。因此,分层随机化和重随机化等方法被广泛用于在试验设计阶段平衡协变量。在重随机化试验中进行Fisher随机化检验需要抽取大量协变量平衡的分配,而重随机化采用的接受拒绝抽样法抽取平衡分配的计算效率很低。为了解决这一问题,本文基于Metropolis--Hasting抽样法的思想,提出了成对交换重随机化和序贯成对交换重随机化,可以高效地抽取平衡分配,并分别适用于非序贯随机化试验和序贯随机化试验。本文证明了均值差估计量在成对交换重随机化下的无偏性,并讨论了方差缩减的下界。本文还推出了反解Fisher随机化检验的显式解,从而可以更快地构造基于随机化的置信区间。模拟研究显示,成对交换重随机化和重随机化具有相当的统计表现,同时将重随机化的速度提升3-23倍。此外,本文应用成对交换重随机化方法分析了两个临床试验数据集,展示了其在实际数据上的优势。分层随机化和重随机化只能在设计阶段平衡低维协变量,而在现代随机化试验中试验者往往可以观测到高维协变量。因此,本文考虑在数据分析阶段对高维协变量进行回归调整。在分层随机化试验下,本文基于投影的视角提出了Lasso调整平均因果效应估计量,研究了其渐近性质,并给出了Lasso调整估计量相比于未调整估计量更精确的条件。本文还提供了一个保守的方差估计,从而实现了有效的统计推断。本文的框架允许在某些分层中仅有一个处理个体或一个对照个体,并允许不同分层中分配到处理组的比例不同,因此同时包含了粗分层随机化试验和精分层随机化试验,特别地,包含了配对随机化试验。本文还研究了Lasso调整估计量在分层重随机化试验下的渐近性质。本文的渐近理论允许分层的数量和分层的大小同时趋于无穷,允许分层间存在异质性因果效应,并且无需假设结果生成模型,因此可以允许模型误设。模拟试验和两个实际数据分析展示了Lasso调整估计量的优越性。

Exploring the causal relationship is the goal of many natural and social sciences. Randomized experiments are the gold standard for evaluating causal effects and are widely used in economics, political science, biomedical science, epidemiology, and other fields. Statisticians are particularly interested in the design and analysis of randomized experiments so as to estimate and infer causal effects efficiently. Although complete randomization can balance the baseline covariates on average, there is still a high probability of observing covariate imbalances in a single treatment assignment. Stratified randomization and rerandomization are widely used to further balance the covariates in the experimental design stage.Fisher randomization tests are widely used to test the significance of causal effects. To perform the Fisher randomization tests in rerandomized experiments, we need to sample many covariate-balanced assignments. However, the acceptance-rejection sampling method adopted in rerandomization is very inefficient. To address this problem, we propose pair-switching rerandomization and?sequential pair-switching rerandomization methods based on Metropolis-Hasting sampling to draw balanced assignments much more efficiently. The proposed methods?are applicable to non-sequential and sequential randomized experiments, respectively. We prove the unbiasedness of the difference-in-means estimator and give the lower bound of variance reduction under pair-switching rerandomization and sequential pair-switching rerandomization. Moreover, we derive an explicit solution for the inversion of Fisher randomization tests so that the randomization-based confidence interval can be constructed more quickly. Simulation results indicate that pair-switching rerandomization achieves comparable power of Fisher randomization tests and is 3--23 times faster than classical rerandomization. In addition, we also analyze two clinical trial data sets using the proposed methods to demonstrate their advantages.Stratified randomization and rerandomization can only balance low-dimensional covariates. However, high-dimensional covariates are often observed in modern randomized experiments. Thus, we consider the use of high-dimensional regression adjustment in the analysis stage. We propose a Lasso-adjusted average treatment effect estimator from the perspective of projection, study its asymptotic properties under stratified randomized experiments and stratified rerandomized experiments, and provide conditions under which the Lasso-adjusted estimator is more accurate than the unadjusted estimator. A conservative variance estimator is also provided for valid statistical inference. Our framework allows for only one treated unit or one control unit in some strata and for different treatment proportions in different strata, and thus includes both coarsely and finely stratified randomized experiments with paired randomized experiments as special cases. Moreover, our asymptotic theory allows the number and sizes of strata to go to infinity simultaneously and the causal effects to be heterogeneous across strata without assuming a true outcome data-generating model. Simulation studies and two real-world examples demonstrate the superiority of the Lasso-adjusted estimator.