The criticism of Dempster-Shafer Theory (DST) by L. A. Zadeh 1 has generated a lot of discussions and articles on the subject of conflicting evidence. In 2005, R. Haenni 2 showed that the surprising result obtained by applying Dempster’s rule to Zadeh’s example was more due to the modelling of the situation than Dempster’s rule not working. I add my grain of salt to the debate on Zadeh’s example by showing a formulation of the problem as a small belief network and using Dempster’s rule of combination to obtain a realistic result. This belief network gives the same results as the combination of the two evidences with the disjunctive rule of combination 3. At the same time, I show how to use my R package dst 4 to do calculations on a belief network.
We suppose that a patient is examined by two doctors, A and B. A’s diagnosis is that P has either meningitis (M), with probability 0.99, or brain tumor (T), with probability 0.01. B agrees with A that the probability of a brain tumor is 0.01, but believes that it is the probability of concussion (C) rather than meningitis that is 0.99.
Zadeh considers the same space of diseases {M, T, C} for the two experts. Hence, after the combination of the two pieces of evidence by Dempster’s rule, we find as a result that the belief of a brain tumor is certain.
#> Diagnosis of Expert 1
#> ZExpert1 specnb mass
#> 1 M 1 0.99
#> 2 T 2 0.01
#> 3 frame 3 0
#> Diagnosis of Expert 2
#> ZExpert2 specnb mass
#> 1 T 1 0.01
#> 2 C 2 0.99
#> 3 frame 3 0
#> Combination of the two experts by Dempster's rule
#> $mbp
#> M T C mass Belief Plausibility Plty Ratio
#> T 0 1 0 1 1 1 -9.079838e+12
#> C 0 0 1 0 0 0 0.000000e+00
#> M 1 0 0 0 0 0 0.000000e+00
#> frame 1 1 1 0 1 1 -9.079838e+12
#>
#> $Conflict
#> [1] 0.9999
Indeed, the result does not reflect the opinions of the two experts. Before rejecting Dempster’s rule as inappropriate to this situation, let’s look more closely at the problem at hand.
Let’s take Expert one. Expert one distributes the whole mass between the two singletons {M} and {T}. Expert one does not consider {C} as a possibility. Hence we conclude that the Fod of Expert one cannot be {M, T, C}. We say that Expert one has restricted the space of possibilities of his/her diagnostic to the Fod {M, T}:
\(Fod(D1) = \{M, T\}\). For simplicity, we write \(D1 =\{M, T\}\).
The same line of reasoning is applied to Expert two. The whole mass of one is allotted to the set {T, C}, and the third possibility (M) is not considered at all. Hence \(D2 = \{T, C\}\).
I show the coding of these two pieces of evidence with the function bca of the package dst.
# attach package dst
library(dst)
#
# Diagnosis from first expert (function e1 attached to variable D1)
e1 <- bca(f= matrix(c(1,0,0,1,1,1), ncol=2, byrow=TRUE), m= c(0.99, 0.01, 0), cnames =c("M", "T"), infovarnames = "D1", varnb = 1)
#
# show the definition of e1
# "Diagnosis of Expert 1 (function e1 attached to variable D1)"
bcaPrint(e1)
#> e1 specnb mass
#> 1 M 1 0.99
#> 2 T 2 0.01
#> 3 frame 3 0
#
# Diagnosis from second expert (function e2 attached to variable D2)
e2 <- bca(f= matrix(c(1,0,0,1,1,1), ncol=2, byrow=TRUE), m= c(0.99, 0.01, 0), cnames =c("C", "T"), infovarnames = "D2", varnb = 2)
#
# show the definition of e2
# "Diagnosis of Expert 2 (function e2 attached to variable D2)"
bcaPrint(e2)
#> e2 specnb mass
#> 1 C 1 0.99
#> 2 T 2 0.01
#> 3 frame 3 0
The two experts are reasoning in two different spaces of possibilities. To be able to combine their diagnosis, we need a common ground. This can be done if we introduce a third person, the patient, with a variable of interest, his/her disease (D). Then it is natural to take the union of the Fod of Expert one and Expert two as the Fod of the patient:
\(D = \{M, T\} \cup \{C, T\} = \{M, T, C\}\).
Hence, the patients’ evaluation of his/her disease involves the disjunction of the evidences of the two experts.This situation is described by a relation of implication between experts and patient:
r1: \(D1 \cup D2 \rightarrow D\).
The relation r1 is represented in the product space \(\prod(D1, D2, D)\) by one focal set of mass one:
\(m(M C M + M C C + M T M + M T T + T C T + T C C + T T T) = 1\) (for simplicity, the “+” sign is used as the \(\vee\) disjunctive operator in the functions of the package dst). Now I use the function bcaRel to code this relation.
# 1. Defining the relation with a (0,1) matrix
tt_r1 <- matrix(c(1,0,1,0,1,0,0,1,0,1,0,0,0,1,1,0,0,1,1,0,0,1,0,0,1,0,1,0,0,1,1,0,0,1,0,0,1,1,0,0,0,1,0,1,0,1,0,1,0,1,1,1,1,1,1,1), ncol = 7,byrow = TRUE)
colnames(tt_r1) = c("M", "T", "C", "T", "M", "T", "C")
#
# 2. Setting the mass function
spec_r1 = matrix(c(rep(1,7),2, rep(1,7), 0), ncol = 2, dimnames = list(NULL, c("specnb", "mass")))
#
# 3. Names of variables names and dimension of their Fod
info_r1 =matrix(c(1:3, 2,2,3), ncol = 2, dimnames = list(NULL, c("varnb", "size")) )
#
# The relation between e1, e2 and a patient p
r1 <-bcaRel(tt = tt_r1, spec = spec_r1, infovar = info_r1, infovarnames = c("D1", "D2", "D"), relnb = 1)
# show the relation
# "Relation between Experts and patient defined in the product space D1 x D2 x D"
bcaPrint(r1)
#> r1 specnb mass
#> 1 M C M + M C C + M T M + M T T + T C T + T C C + T T T 1 1
#> 2 frame 2 0
We now have all the elements of a small network made of one relation (r1) between three variables: Disease (D), Diagnosis1 (D1), Diagnosis2 (D2), and two pieces of evidence coming from Expert one (e1) and Expert two (e2).
The three variables are the nodes of the graph: Disease (D), Diagnosis1 (D1), Diagnosis2 (D2). The edges (hyperedges) are given by the relation r1 and the two pieces of evidence (Expert one and Expert two). Using the igraph package, 5 a bipartite graph corresponding to the hypergraph is obtained.
Our goal is the calculation of a belief function “p” attached to the variable of interest “D” (Disease of the patient). At the beginning of the process, this function is simply the vacuous specification:
\(p(\{M, T, C\}) = 1\).
We proceed by successive elimination of variables until only the variable D remains, and the result of the function p is obtained.
The calculations involved are following the principles of the valuation language of Shenoy 6; see also 7. The variables are linked to functions (called valuations). A function can be a piece of evidence attached to a variable or a relation between two or more variables.
Three kinds of operations are involved in the calculations: a) the minimal (vacuous) extension of a mass function to a larger Fod; b) the combination of two mass functions by Dempster’s rule; c) the marginalization of a mass function, i.e. eliminating a variable to reduce the function to a smaller Fod. Let’s do it.
First step: eliminate variable D1 (Diagnosis1). Using function extmin, we extend the mass function e1 to the space \(\prod(D1, D2, D)\); then we combine e1 extended with r1, using functions dsrwon and nzdsr (normalization); finally, we use function elim to eliminate D1 by marginalizing to \(\prod(D2, D)\). The mass function obtained is named rel_2.
# some settings to begin
# variables numbers: D1=1, D2=2, D=3
N <- 1:3
# Elimination order of the variables. The goal: Disease (variable 3)
elim_order <- c(1,2,3)
var_to_elim <- rownames(meddiag_hgm)[order(elim_order)]
#
# 1: first step
# "first variable to eliminate"
cat(var_to_elim[1]) # Diagnosis1 (1)
#> D1
irel_to_elim<- meddiag_hgm["D1",]*1:ncol(meddiag_hgm)
rels_nb <- irel_to_elim[irel_to_elim>0]
#
# 1 Fusion of Expert1 and Relation 1 and eliminate variable Diagnosis1
# 1.1 Extension of e1 (Expert1)
Expert1_ext <- extmin(get(meddiag_data_names[1]), get(meddiag_data_names[3]))
# "Evidence of Expert 1 extended to the product space D1 x D2 x D"
bcaPrint(Expert1_ext)
#> Expert1_ext specnb mass
#> 1 M C M + M C T + M C C + M T M + M T T + M T C 1 0.99
#> 2 T C M + T C T + T C C + T T M + T T T + T T C 2 0.01
#> 3 frame 3 0
#
# 1.2 Combination of Expert1 and r1
r2 <- nzdsr(dsrwon(Expert1_ext, get(meddiag_data_names[3]), relnb = 1 + length(meddiag_data_names)))
# "Subsets resulting from the combination of Expert 1 extended and r1"
bcaPrint(r2)
#> r2 specnb mass
#> 1 M C M + M C C + M T M + M T T 1 0.99
#> 2 T C T + T C C + T T T 2 0.01
#> 3 M C M + M C C + M T M + M T T + T C T + T C C + T T T 3 0
#> 4 M C M + M C T + M C C + M T M + M T T + M T C 4 0
#> 5 T C M + T C T + T C C + T T M + T T T + T T C 5 0
#> 6 frame 6 0
#
# 1.3 Marginalization (eliminate variable D1)
rel_2 <- elim(r2, xnb = order(elim_order)[1])
# "subsets in the product space D2 x D (after elimination of D1"
bcaPrint(rel_2)
#> rel_2 specnb mass
#> 1 C M + C C + T M + T T 1 0.99
#> 2 C T + C C + T T 2 0.01
#> 3 C M + C T + C C + T M + T T 3 0
#> 4 frame 4 0
After this first step, the graph is updated.
Second step: Eliminate variable D2 (Diagnosis2). We extend e2 to the space \(\prod(D2, D)\); we combine e2 extended with rel_2 and marginalize to D to get the final result.
# 2. Second step
# Fusion of Expert2 and Relation 2 and eliminate diagnosis2
#
# 2.1 Extension of Expert2 (e2)
# here, use rel_2 as the relation of reference for extmin
Expert2_ext <- extmin(e2, rel_2)
temp1 <- as.data.frame(cbind(rownames(Expert2_ext$tt), Expert2_ext$spec))
colnames(temp1)[1] <- "Expert2_ext"
# "Evidence of Expert 2 extended to the product space D2 x D"
print(temp1)
#> Expert2_ext specnb mass
#> 1 C M + C T + C C 1 0.99
#> 2 T M + T T + T C 2 0.01
#> 3 frame 3 0
#
# 2.2 combinaison pf Expert2 and rel_2
r3 <- nzdsr(dsrwon(Expert2_ext, rel_2, relnb = 3))
temp1 <- as.data.frame(cbind(rownames(r3$tt), r3$spec))
colnames(temp1)[1] <- "r3"
# "Subsets of the space D2 x D resulting from the combination of Expert 2 extended and r2"
print(temp1)
#> r3 specnb mass
#> 1 C M + C C 1 0.9801
#> 2 T M + T T 2 0.0099
#> 3 C M + C C + T M + T T 3 0
#> 4 C T + C C 4 0.0099
#> 5 T T 5 1e-04
#> 6 C T + C C + T T 6 0
#> 7 C M + C T + C C 7 0
#> 8 C M + C T + C C + T M + T T 8 0
#> 9 T M + T T + T C 9 0
#> 10 frame 10 0
#
# 2.3 Final result: the belief function p attached to variable D)
p <- elim(r3, xnb = order(elim_order)[2])
#
# add singletons with 0 mass to show all singletons in the results
p_sing <- addTobca(p, f = matrix(c(1,0,0,0,0,1), ncol=3))
# "The final result after elimination of variable D2"
tabresul(p_sing)
#> $mbp
#> M T C mass Belief Plausibility Plty Ratio
#> M 1 0 0 0.0000 0.0000 0.9900 0.99000000
#> C 0 0 1 0.0000 0.0000 0.9900 0.99000000
#> M + C 1 0 1 0.9801 0.9801 0.9999 50.24623116
#> M + T 1 1 0 0.0099 0.0100 1.0000 1.01010101
#> frame 1 1 1 0.0000 1.0000 1.0000 Inf
#> T + C 0 1 1 0.0099 0.0100 1.0000 1.01010101
#> T 0 1 0 0.0001 0.0001 0.0199 0.01990199
#>
#> $Conflict
#> [1] 0
The plausibility ratio column shows that the odds for \(M \vee C\) against T are 50:1. hence, the disease of the patient must be M or C. We also see that each single hypothesis, M, or, C, remains highly plausible (0.99). Although there is some support for T, its plausibility is very weak at 0.019. Finally, we use the plausibility transformation 8 to look at the probability distribution over the Fod {M, T, C} of the patient that we obtain
Now we have very high odds for M against T or for C against T: 495:1.
L. A. Zadeh. A mathematical theory of evidence (book review). AI Magazine, 55(81—83), 1984↩
R. Haenni. Shedding New Light on Zadeh’s Criticism of Dempster’s Rule of Combination. Conference: Information Fusion, 2005 8th International Conference↩
P. Smets (1993). Belief Functions: The Disjunctive Rule of Combination and the Generalized Bayesian Theorem. IRIDIA - Université Libre de Bruxelles, Brussels, Belgium↩
Csardi G, Nepusz T: The igraph software package for complex network research, InterJournal, Complex Systems 1695. 2006. http://igraph.org↩
P. P. Shenoy. A Valuation-Based Language for Expert systems. lnternational Journal of Approximate Reasoning 1989, 3 383–411↩
P. P. Shenoy. Valuation-Based Systems. Third School on Belief Functions and Their Applications, Stella Plage, France. September 30, 2015↩
Cobb, B. R. and Shenoy, P.P. (2006). On the plausibility transformation method for translating belief function models to probability models. Journal of Approximate Reasoning, 41(3), April 2006, 314–330↩