Processing math: 100%

Very Very Very Brief Description of MRMC

Issei Tsunoda, a xxxxx, above me only sky …

2020-07-02

Conventional Notation

In the following, we use the conventional likelihood notation;

f(y|θ),

where y denotes data and θ is a model parameter.

Data y

1 readers, 1 modalities and 3 confidence levels.

Confidence Level Modality ID Reader ID Number of Hits Number of False alarms
3 = definitely present 1 1 H3 F3
2 = equivocal 1 1 H2 F2
1 = questionable 1 1 H1 F1

where, each component Hc and Fc are non negative integers. For example, H3 denotes the hit of the 1st reader over all images taken by 1st modality with reader’s confidence level is 3rd.

So, in conventional notation we may write

y=(H0,H1,H2,H3;F1,F2,F3;NL,NI),

where we set H0:=NL(H1+H2+H3).

Likelihood f(y|θ)

For model of 1 readers, 1 modalities and 3 confidence levels.

Define the model by {Hc;c=0,1,2,C}Multinomial({pc,(θ)}c=0,1,2,C),FcPoisson(qc(θ)NI),

where

pc(θ):=θc+1θcGaussian(x|μ,σ)dx,qc(θ):=θc+1θcdlogΦ(z)dzdz.

from which, we can calculate the most important characteristic indicating the observer performance ability as

AUC:=Φ(μ/σ(1/σ)2+1),

Note that model parameter is θ=(θ1,θ2,θ3,...θC;μ,σ) which should be estimated and Φ denotes the cumulative distribution functions of the canonical Gaussian. Note that θC+1= and θ0=.

The definition of the rates p,q are based on the logic of latent panty theory, so you know what is latent, I am tired because I am a xxxxx!

Prior

dzc:=zc+1zc,dzc,σm,rUniform(0,),zcUniform(,100000),AmUniform(0,1).

Example codes to fit the above model

Next section, ….

we consider the comparison of imaging modality, such as MRI, CT, PET, …

I implement this because Bayesian is suitable for including individual differences,…. the author think this is a venefit of Bayesian ….

Data y

2 readers, 2 modalities and 3 confidence levels.

Confidence Level Modality ID Reader ID Number of Hits Number of False alarms
3 = definitely present 1 1 H3,1,1 F3,1,1
2 = equivocal 1 1 H2,1,1 F2,1,1
1 = questionable 1 1 H1,1,1 F1,1,1
3 = definitely present 1 2 H3,1,2 F3,1,2
2 = equivocal 1 2 H2,1,2 F2,1,2
1 = questionable 1 2 H1,1,2 F1,1,2
3 = definitely present 2 1 H3,2,1 F3,2,1
2 = equivocal 2 1 H2,2,1 F2,2,1
1 = questionable 2 1 H1,2,1 F1,2,1
3 = definitely present 2 2 H3,2,2 F3,2,2
2 = equivocal 2 2 H2,2,2 F2,2,2
1 = questionable 2 2 H1,2,2 F1,2,2

where, each component H and F are non negative integers. By the multi-index notation, for example, H3,2,1 means the hit of the 1-st reader over all images taken by 2-nd modality with reader’s confidence level is 3.

So, in conventional notation we may write

y=(Hc,m,r,Fc,m,r;NL,NI).

Likelihood f(y|θ)

The following model with multinomal is not implemeted yet, cuz the author is tired by MCS disease! Ha,,, above me only sky! This package dose not help me! :’-D

{Hc,m,r;c=1,2,C}Multinomial({pc,m,r(θ);c=1,2,C},NL),Fc,m,rPoisson(qc(θ)).

pc,m,r(θ):=θc+1θcGaussian(x|μm,r,σm,r)dx,qc(θ):=θc+1θcdlogΦ(z)dzdz.

Am,r:=Φ(μm,r/σm,r(1/σm,r)2+1),Am,rNormal(Am,σ2r),

where model parameter is θ=(θ1,θ2,θ3,...θC;μm,r,σm,r) which should be estimated and Φ denotes the cumulative distribution functions of the canonical Gaussian. Note that θC+1=.

Someone might consider that this is not a suitable model cuz it dose not includes full heterogenity. However, in MCMC algorithm, such full model leads the author non-convergent issues. So, I am tired. I do not implement this multinomial model yet, but what is the difficulty in Bayesian? The author begin R language in 2~3 years ago with Stan, then first, I implement the full heterogeneity model. But against my effort, it never converges. So, next, the bitch author tried to reduce this individual differences so that the model converges. Then I found this model. So, please do not bother me such questions.

Prior

dzc:=zc+1zc,dzc,σm,rUniform(0,),zcUniform(,100000),AmUniform(0,1).

This is only example, and in this package I implement proper priors. The author thinks the above prior is intuitively the simplest non informative priors without the coordinate free property.

R script of MRMC

My apologies

2020 Jun 30, the author implemented FROC model using multinomial distribution, which is the most traditional one.

The author had misunderstood the traditional FROC model,..my model is not wrong, but,,,, the traditional model is simpler so, in this current version, the author implemented the traditional one. Gratias.

As far as I’m concerned, the traditional model and the author’s new model are not equivalent in MCMC sampling. The author compare both model by fitting a new and a classical model to datasets with many zero cells.

The author also compare both models in WAICs, but significant difference is not detected from my point of view.

Conjecture on prior

Poisson rate and multinomial Bernoulli rates should be contained the regular interval [ϵ,1ϵ] for some fixed small ϵ, i.e., pc,qc[ϵ,1ϵ]

Monotonicity p1<p2< and q1>q2> also reasonable, where subscripts means rating and a high number indicates a high confidence level.

The author found simultaneous zero hits or false alarms cause a bias in MCMC sampling. If prior is not suitable or non-informative, then such phenomenon occurs and SBC detects it.

So, we have to find the prior to satisfy the monotonicity and the regular interval condition.

The following SBC shows our prior is not good, because for some parameter, the rank statistics is not uniformly distributed.

Appendix:

 Why a Bayesian approach now. 

In the following, the author pointed out why frequentist p value is problematic. Of course, under some condition, Bayesian p -value coincides with frequentist p value, so the scheme statistical test is problematic. We shall show the reason in the following simple example.

To tell the truth, I want to use epsilon delta manner, but for mathematics people, I do not use it.

  • The next section proves the monotonicity of p value for the most simple statistical test.

I want to publish this proof, but all reviewers are against it. There is a free space to speech here, so please enjoy my logic. I really like this because My heart is in math.

Monotonicity issues on p value

The methods of statistical testing are widely used in medical research. However, there is a well-known problem, which is that a large sample size gives a small p-value. In this section, we will provide an explicit explanation of this phenomenon with respect to simple hypothesis tests.

Consider the following null hypothesis H0 and its alternative hypothesis H1; H0:E[Xi]=m0,H1:E[Xi]>m0, where E[Xi] means the expectation of random samples Xi from a normal distribution whose variance σ20 is known. In this situation, the test statistic is given by Ztest:=¯Xnm0σ20/n, where ¯Xn:=i=1,,nXi/n is normally distributed with mean m0 and standard deviation σ0/n. Under the null hypothesis, Ztest is normally distributed with mean 0 and a standard deviation 1 (standard normal distribution). The null hypothesis is rejected if Ztest>z2α , where z2α is a percentile point of the normal distribution, e.g., z0.025=1.96.

Suppose that the true distribution of X1,,Xn is a normal distribution with mean m0+ϵ and variance σ20, where ϵ is an arbitrary fixed positive number. Then Ztest=¯Xn(m0+ϵϵ)σ20/n=ZTruth+ϵσ20/n where ZTruth:=(¯Xn(m0+ϵ))/σ20/n.

In the following, we calculate the probability with which we reject the null hypothesis H0 with confidence level α. Prob(Ztest>z2α)=Prob(ZTruth+ϵσ20/n>z2α)=Prob(ZTruth>z2αϵσ20/n)=Prob(ZTruth>z2αϵσ0n) Note that ϵ/σ0 is called the effect size.

Thus, if z2αϵn/σ0<z2(1β), i.e., if n>(z2αz2(1β))2σ20ϵ2, then the probability that the null hypothesis is rejected is greater than 1β.

For example, consider the case σ0=1, α=0.05, and (1β)=α, then z2α=1.28 and in this case, for all ϵ>0, if n>7ϵ2 then the probability in which above hypothesis test concludes that the difference of the observed mean from the hypothesized mean is significant is greater than 0.95. This means that almost always the p-value is less than 0.05. Thus a large sample size induces a small p-value.

For example,

  • if ϵ=1 then by taking a sample size such that n>7, then almost always the conclusion of the test will be that the observed difference is statistically significant. Similarly,

  • if ϵ=0.1 then by taking a sample size such that n>700, then almost all tests will reach the conclusion that the difference is significant; and

  • if ϵ=0.01 then by taking sample size so that n>70000, then the same problem will arise.

This phenomenon also means that in large samples statistical tests will detect very small differences between populations.

By above consideration we can get the result ``significance difference’’ with respect to any tiny difference ϵ by collecting a large enough sample n, and thus we must not use the statistical test.