城市时空变化是一种由社会经济、自然环境、文化乃至政策等多因素协同驱动的复合现象。多模态遥感提供的多模态多尺度信息可以更全面且高效地刻画城市,因而在城市时空变化研究中扮演了越来越重要的角色。尽管深度学习和多模态遥感的加持让城市时空变化研究快速发展,但基于多模态遥感数据的城市时空变化特征提取研究仍面临诸多挑战。首先现有研究涉及的遥感数据在模态数量和时空覆盖范围上相对较少,且存在影像质量和分辨率上的巨大差异。其次现有方法在城市空间特征细节、时空特征泛化能力和异质多模态特征融合上表现较差。上述来自数据和方法层面的困难导致现有城市时空变化研究大多局限于某个特定任务和场景,无法开展大范围多尺度的城市时空变化研究。针对上述挑战,本文从以下三个方面展开研究以支撑高效且泛化性强的多尺度城市时空变化研究。 第一,在数据构建层面,本研究构建了一个包括五组不同模态的多模态遥感数据,在时间、空间和模态三个维度上提高城市时空变化研究表征维度的全面性。本研究进一步提出了基于边缘增强的无监督超分辨率重建方法,从而减少影像时空覆盖和分辨率差异对构建多模态遥感数据的影响,该方法无需外部监督信息即可在实现最高达2.35dB的信息增益和至高8倍的超分重建结果。 第二,在算法开发层面,本研究构建了一个基于深度学习的多模态遥感城市时空变化特征提取和融合框架,在空间拓扑正确性、时空变化统一性和多模态融合泛化性三个角度提高特征提取和融合的有效性。本框架在空间特征上充分优化城市要素间空间拓扑关系,在时空特征上精细建模长时序大范围的城市时空变化特征,在多模态特征融合上有效改善空间异质不对齐的多模态遥感特征融合表现。 第三,基于上述数据和算法,本研究进行城市时空变化的多尺度延展。时间尺度上延展进行6年逐年分割任务并取得与官方专家统计结果达0.95的R?。空间尺度上延展到全球多城市分割任务并实现了4.41%的IoU提升。模态尺度上延展到5个模态的重建任务并实现了至高13.25dB的性能增益。通过多尺度同时延展,本研究还获得了全球37年长时序夜光数据,充分展现了本研究所提方法在精细空间表征、时空拓扑关系保持、多模态特征融合和时空特征建模上的泛化性和有效性。至此,本研究从数据、算法和应用三个层面为基于多模态遥感数据和多尺度特征提取的城市时空变化研究贡献了一个具有实际应用潜力和价值的新思路。
The spatiotemporal dynamics in urban environments constitute a complex phenomenon synergistically driven by various factors encompassing socioeconomic, natural, cultural, and policy-related aspects. Multimodal remote sensing has emerged as an increasingly important role in urban spatiotemporal dynamics research, as it offers multi-scale and multimodal information for comprehensively and efficiently depicting urban spatiotemporal dynamics. Despite the rapid progress made in deep learning and multimodal remote sensing areas, there are numerous challenges in the research of urban spatiotemporal dynamics based on multimodal remote sensing data. First, existing research suffers from limitations in terms of both the number of modalities and the spatiotemporal coverage of remote sensing data, accompanied by substantial disparities in image quality and resolution. Second, existing methods exhibit shortcomings in capturing urban spatial details, generalizing spatiotemporal features, and integrating heterogeneous multimodal characteristics effectively. The above-mentioned difficulties in data and methodology have resulted in most existing research on urban spatiotemporal changes being limited to a specific task and scenario, making it difficult to conduct large-scale and multi-scale research on urban spatiotemporal changes. In response to these challenges, this thesis embarks on research in three key dimensions to support efficient and robust feature extraction of multiscale urban spatiotemporal dynamics. First, this thesis establishes a multimodal data comprising five remote sensing modalities, improving the comprehensiveness of the feature representations in urban spatiotemporal dynamic research in terms of temporal, spatial, and modal dimensions. Furthermore, this thesis proposes an unsupervised super-resolution reconstruction method based on edge enhancement, mitigating the impact of image quality issues and resolution mismatch on the construction of a unified multimodal remote sensing data matrix. This method achieves fine-grained texture reconstruction across multiple super-resolution scales without relying on external supervision. Second, this thesis introduces a deep learning-based framework for the feature extraction and fusion of urban spatiotemporal dynamics based on multimodal remote sensing data. This framework enhances the effectiveness of multimodal urban spatiotemporal dynamics feature extraction and fusion from three perspectives. 1) This framework optimizes spatial relationships among urban elements for higher topologically spatial accuracy. 2) This framework efficiently models heterogeneous spatiotemporal dynamics features for better spatiotemporal dynamics feature harmony. 3) This framework effectively improves the fusion performance of heterogeneous, misaligned multimodal remote sensing features and multimodal fusion generalization. Third, based on the above data and algorithms, this study conducts multi-scale extension of urban spatiotemporal dynamcis from four aspects. 1) The conducted experiment extends the time scale to 6 years image segmentation task, achieving a statistical result of R?=0.95 compared to official results released by governments. 2) The conducted experiment extends the spatial scale to global multi-city image segmentation tasks, achieving 4.41% improvement on IoU . 3) The conducted experiment extends to 5 modalities, achieving up to 13.25dB improvements in the reconstruction results. 4) This thesis also achieved 37 years of global long-term nighttime light data by multi-scale extension, fully demonstrating the generalization and effectiveness of the proposed method in fine spatial representation, spatiotemporal topological relationship maintenance, multimodal feature fusion, and spatiotemporal feature modeling. In conclusion, this thesis contributes a novel and promising approach for urban spatiotemporal dynamics research based on multimodal remote sensing data from the perspectives of data cube establishment, algorithm development, and downstream application, thus holding potential value for both remote sensing research domain and real-world applications.