M. J. Druzdzel and F. J. Díez. Combining knowledge from different sources in probabilistic models. Journal of Machine Learning Research, 4 (2003) 295-316.

22 pages. PDF (150 KB), zip version (131 KB), BibTeX entry.


Building probabilistic and decision-theoretic models requires a considerable knowledge engineering effort in which the most daunting task is obtaining the numerical parameters. Authors of Bayesian networks usually combine various sources of information, such as textbooks, statistical reports, databases, and expert judgement. In this paper, we demonstrate the risks of such a combination, even when this knowledge encompasses such seemingly population-independent characteristics as sensitivity and specificity of medical symptoms. We show that the criteria "do not combine knowledge from different sources'' or "use only data from the setting in which the model will be used'' are neither necessary nor sufficient to guarantee the correctness of the model. Instead, we offer graphical criteria for determining when knowledge from different sources can be safely combined into the general population model. We also offer a method for building subpopulation models. The analysis performed in this paper and the criteria we propose may be useful in such fields as knowledge engineering, epidemiology, machine learning, and statistical meta-analysis.