水源水质对饮用水的安全有重要的影响。因此,饮用水水源的早期预警对于采取适当的,经济的措施来保障人类健康意义重大。本研究采用人工神经网络(Artificial Neural Networks, ANNs)和一个污染探测过程(Contamination Event Detection Process, CED Process)对河流中的污染事件进行探测。ANN模型对实验室获得的水质数据处理分为离线学习和在线的连续预测两个阶段。在第二阶段,决策过程通过对预测值和实际测量值进行比较来确定某时刻是否发生了污染,同时,模型对数据的学习和预测是不断更新和改进的过程。本研究的主要目标是评价ANN模型和决策过程对于污染事件的探测能力。本研究所采用的数据包括我们团队先前通过中试污染物投加模拟实验获得的数据,以及发生在长江流域真实污染事件在线监测数据。通过对中试实验数据进行测试,发现ANN模型作为水源污染探测模型的特点和缺陷。此外,也应用真实污染事件数据对模型进行了测试。结果表明通过参数调试,简单的ANN模型就能够对复杂时间序列进行模拟,即使对真实污染事故也能进行模拟,但对部分实验数据的模拟效果较差。CED过程对整个算法的准确率起着决定性作用。因此,对CED来说寻找一个好的决策方法比预测方法更重要。当然预测与决策部分是紧密关联的。模拟实验的一个目标是探索ANN未知的信息并将其运用到决策过程中。决策过程中探测到的异常输入信号并不一定代表了污染事件的发生,例如有冷却水从附近电厂流入也可以引起输入信号的异常。这些信息可以帮助ANN模型在不同基线(无污染状态)之间过渡,可能使算法无法探测到过渡期间发生的污染事件。模拟实验通过混淆矩阵和受试特征曲线等统计方法进行评价。但是,混淆矩阵以及从中提取出来的指标是时间独立的,因此用其来评价CED过程会存在问题。因此,本研究提出了一个时间依赖的加权性能指标用以解决这个问题,同时考虑到时间序列二分类问题需要一个更灵活的评价方法。分析其优缺点是本研究的另一个重要的部分。本研究将人工神经网络模型用于水源污染事件探测。分析了人工神经网络模型的优势和局限性。针对时间序列的二分类问题提出一种更加实用的评价方法。最后值得一提的是,水质在线监测探头的选择非常重要,因为如果污染物不能引起水质参数变化,任何污染探测方法都不能探测到污染发生。
Source water quality plays an important role for the safety of water as a drinking water source. Early detection of the contamination of drinking water sources is therefore vital to taking appropriate cost-efficient measures to protect the potentially affected population's health. In this project, Artificial Neural Networks (ANNs) and a Contamination Event Detection Process (CED Process) were used to identify contamination events in river water. The neural network models the response of basic water quality sensors obtained in laboratory experiments in an off-line learning stage and continuously forecasts future values of the time line in an on-line forecasting stage. During this second stage, the decision process compares the forecast to the measured value and classifies it as a regular background value or as a contamination event, which modifies the network's continuous learning and influences its forecasts.This study's primary goal is to evaluate how capable ANNs and the decision process developed for this project are in the context of solving problems like CED. To test their abilities, data from laboratory experiments conducted previously in this group have been used to perform a number of simulations highlighting features and drawbacks of this method. Additional tests have been conducted on a data set recorded during an actual event in a river in China. The results suggest that a simple properly tuned ANN is capable of modeling rather complex time series, even from the real event, with some exceptions. The CED process, however, appears to be the one governing the majority of the whole algorithm's success rate. Therefore, finding a good decision-making method seems to be more important for CED than having a good modeling and forecasting method, although both of them are clearly closely coupled.One of the simulation experiments aimed at exploring the inclusion of additional information, that is not available to the network, into the decision process. A so-called operator input is provided to the decision process to inform it about unusual water quality levels that are not related to the presence of contamination, for example due to cooling water discharge from a nearby power plant. Applying this additional input of information has been shown to have a positive influence on the network's ability to handle the change in background values, but it might also mask the presence of contamination.Statistical tools, namely confusion matrices and ROC curves, have been used for the evaluation of the conducted simulation experiments. However, the confusion matrix and indicators derived from it are inherently problematic for the assessment of the performance of the CED process used in this study because they are time-independent. A time-dependent, weighted performance metric is proposed to tackle this issue and allow for a more flexible assessment of binary classifiers in time series classification. Analyzing its advantages and disadvantages is another major part of this study.This study tries to add to the ongoing research into algorithms for CED. It provides an analysis of what the advantages and limitations of ANNs in this context are and how results from the binary classification of time series could be evaluated in a more realistic fashion. Finally, more general thoughts on the selection of basic water quality sensors for CED are given, showing that a detection algorithm can only be successful if the contaminant triggers any of them, which is not necessarily a given, especially for the real data set used in this study.