[Mitarbeiter.zoologie] Code 2.8 from Statsitical rethinking

Mo Feb 14 10:41:53 CET 2022

Dear Hassan, all,

great; thanks for explaining the code to us again! Regarding your comment about BEAST (or MrBayes- the other best known program in phylogenetics using a Bayesian approach), it always depends what you define as so-called "burn-in" (how much of the initial samples will be discarded). Usually people use a burn-in of 10% (also the default in MrBayes) or even 25% of the total number of samples collected. For example, if you have 100,000 samples collected than indeed 10% of this means a burn-in of 10,000 samples. However, the most important to know is that because of the initial/first samples usually having a low likelihood (because you start at a random point which is often far from the maximum= samples having the highest likelihood) you discard them. Hope that helps?!

Best,
Belinda

>>> Hassan Shafiey <h.shafiey at gmail.com> 02/11/22 5:19 PM >>>
Dear Statistical rethinkers, 
I've thought a bit more on code 2.8 and I want to share it with you. Correct me please if I am wrong.  

the aim of the code: reconstruction of posterior probability distribution of proportion of the Earth surface covered with water given 6 waters out of 9 observations. This will be done by repeatedly generating random numbers under the light of observation (6 W out of 9). Each random number depends on the previous random number hence it is a Markov chain. 

# You decide on the size of the Markov chain. 
n_samples <- 1000
# Memory allocation. Array p contains our Markov chain. 
p <- rep( NA , n_samples )

# Setting the first number. It could be any valid value as long as you have enough random numbers. But let's be unbiased and start with 0.5. 
p[1] <- 0.5

# These are our observations. 
W <- 6
L <- 3

# Through this loop, in each step we generate a random number and if it is verified it will be stored in array p. You start with 2 because you already set the first one.  
for ( i in 2:n_samples ) {

# You generate a random number from normal dist with the mean equals to your previous (old) random number and call it p_new. You can use any other distribution (I think) for generating random numbers as long as it is dependent on the previous random number. 
p_new <- rnorm( 1 , p[i-1] , 0.1 )

# Since the nature of p_new is probability, it must be between 0 and 1. Here you check for this and if it is outside this interval you bring it back inside. This code however is not perfect. Suppose you got 8, then based on the second checkpoint, you subtract it from 2 which you get -6 which is still outside the valid interval. You could use a command line like this instead:
while(p_new <0 | p_new >1)  p_new <- rnorm( 1 , p[i-1] , 0.1 ). 
if ( p_new < 0 ) p_new <- abs( p_new )
if ( p_new > 1 ) p_new <- 2 - p_new

# Now you have a valid new random number and you need to decide whether to accept it or stay with the old random number. This will be done based on likelihoods of the old random number (q0) and new one (q1). The likelihoods are calculated under the light of observations. Here is the meaning of q0 for example (suppose the old value is still 0.5): If 50% of the Earth surface is covered by water, what is the probability that out of 9 sampling you get 6 waters. 
q0 <- dbinom( W , W+L , p[i-1] )
q1 <- dbinom( W , W+L , p_new )

# Once you have q0 and q1, you accept or reject the new random number proportional to q1/q0. If for example q1/q0 happens to be 3, it means that the new random number is 3 times more plausible than the old one but this doesn't mean that you definitely accept the new one. You must always make some room for less plausible value as well, otherwise you don't get a distribution. 
p[i] <- ifelse( runif(1) < q1/q0 , p_new , p[i-1] )
}

PS1: Antonella noticed that upon increasing the size of the Markov chain, you get a better approximation to the analytical solution. 

PS2: If you are not sure that you started the Markov chain with an unbiased value, it's better to discard some of the first random numbers and then convert it to the plot. BEAST discards the first 10000 numbers by default I think. Belinda! Is that correct? 

This is my understanding. Hope it helps a bit. 

All the best 
Hassan 

-------------- nächster Teil --------------
Ein Dateianhang mit HTML-Daten wurde abgetrennt...
URL: <http://lists.uni-halle.de/pipermail/mitarbeiter.zoologie/attachments/20220214/8d5557f0/attachment.html>