PBIO 294 HW 13 Sep 2017

Reinforcing concepts in K4, L1 and developing R skills.

Lavine. 1.10. Problem 2. Simulating Dice Rolls.

  • (a) simulate 6000 dice rolls. Count the number of 1’s, 2’s, . . . , 6’s.
diceRoll <- sample(1:6, size = 6000, replace = TRUE) # Create a dice and roll it 6000 times. Sample function takes a random sample from each roll
table(diceRoll) # lets see what it looks like
## diceRoll
##    1    2    3    4    5    6 
##  985 1045  997 1000  981  992
  ### this bit of code runs in R but won't Knit in RMarkdown at the moment..###

#count(diceRoll == 1) # Count funtion produces a datarame with the frequency of T/F for the parameters
#count(diceRoll == 2)
#count(diceRoll == 3)
#count(diceRoll == 4)
#count(diceRoll == 5)
#count(diceRoll == 6) 
  • (b) You expect about 1000 of each number. How close was your result to what you expected?
table(diceRoll) # they were close but there is some variabily, ~100
## diceRoll
##    1    2    3    4    5    6 
##  985 1045  997 1000  981  992
table(diceRoll)/length(diceRoll)*100 # convert to percent
## diceRoll
##        1        2        3        4        5        6 
## 16.41667 17.41667 16.61667 16.66667 16.35000 16.53333
  • (c) About how often would you expect to get more than 1030 1’s? Run an R simulation to estimate the answer.
#Tough question. I referenced Lavine 1.7 Simulation chapter where he simulates Craps

sim.diceRolls <- function() {  # create a funtion called sim.diceRolls that rolls the dice 6000x and gives a result
  dice <- sample(1:6, 6000, replace = T) # create a dice, just like above. Roll it 6000 times and take a random sample from each roll, adding the results to the variable dice 
      if (sum(dice == 1) >1029) # if the sum of the number of occurences of dice side number 1 (arbitrary) is greater than 1029, then...
        Results <- 1 #... add a 1 to the results
      else
        Results <- 0 # if it isn't >1030, than add 0, or nothing
return(Results)
            }

n <- 1000 # now lets simulate our function 1000x
Results = 0 # start with 0 value in results, because we want to recursively popuale this value with each iteration
for (i in 1:n) # a forloop for n iterations
  Results <- Results + sim.diceRolls() #recursively add to results each time we the previous value of results plus that which the function sim.diceRolls produces
print(Results/n)*100 # percent of time that you expect  to  get more    than    1030    1s
## [1] 0.162
## [1] 16.2

Lavine 1.10 Problem 37. Solve analytically or by simulation.

Ecologists are studying salamanders in a forest. There are two types of forest. Type A is conducive to salamanders while type B is not. They are studying one forest but don’t know which type it is. Types A and B are equally likely. During the study, they randomly sample quadrats. (A quadrat is a square-meter plot.) In each quadrat they count the number of salamanders. Some quadrats have poor salamander habitat. In those quadrats the number of salamanders is 0. Other quadrats have good salamander habitat. In those quadrats the number of salamanders is either 0, 1, 2, or 3, with probabilities 0.1, 0.3, 0.4, and 0.2, respectively. (Yes, there might be no salamanders in a quadrat with good habitat.) In a type A forest, the probability that a quadrat is good is 0.8 and the probability that it is poor is 0.2. In a type B forest the probability that a quadrat is good is 0.3 and the probability that it is poor is 0.7.

What we know:

\[p(A) = 0.5\] \[p(B) = 0.5\]

\[p(G|A) = 0.8\] \[p(G|B) = 0.3\] \[p(P|A) = 0.2\] \[p(P|A) = 0.7\]

\[p(S=0|P) = 1\] \[p(S=1|P) = 0\] \[p(S=2|P) = 0\] \[p(S=3|P) = 0\]

\[p(S=0|G) = 0.1\] \[p(S=1|G) = 0.3\] \[p(S=2|G) = 0.4\] \[p(S=3|G) = 0.2\]

where: \[S = Salamanders\] \[A = Forest A\] \[B = Forest B\] \[G = Good Habitat\] \[P = Poor Habitat\]

Now we create a table of conditional probability

         |  Good Habitat |  Poor Habitat | Marginal
---------|---------------|---------------|--------------
Forest A |  0.8*0.5 = 0.4| 0.2*0.5 = =0.1|  0.4+0.1 = 0.5
Forest B | 0.3*0.5 = 0.15| 0.7*0.5 = 0.35|0.15+0.35 = 0.5
Marginal |0.4+0.15 = 0.55|0.1+0.35 = 0.45|    0.5+0.5 = 1
  • (a) On average, what is the probability that a quadrat is good?

see marginal (sum) of Good Habitat 0.55

  • (b) On average, what is the probability that a quadrat has 0 salamanders, 1 salamander, 2 salamanders, 3 salamanders?

See r script below for answers

\[p(S=0) = (p(S=0|G)*p(G) + (p(S=0|P)*p(P) = 0.505\] \[p(S=1) = (p(S=1|G)*p(G) + (p(S=1|P)*p(P) = 0.165\] \[p(S=2) = (p(S=2|G)*p(G) + (p(S=2|P)*p(P) = 0.22\] \[p(S=3) = (p(S=3|G)*p(G) + (p(S=3|P)*p(P) = 0.11\]

pA = 0.5
pB = 0.5
pGA = 0.8
pGB = 0.3
pPA = 0.2
pPB = 0.7
pS0P = 1
pS1P = 0
pS2P = 0
pS3P = 0
pS0G = 0.1
pS1G = 0.3
pS2G = 0.4
pS3G = 0.2
pP = 0.45
pG = 0.55

pS0 = (pS0G*pG) + (pS0P*pP)
pS1 = (pS1G*pG) + (pS1P*pP)
pS2 = (pS2G*pG) + (pS2P*pP)
pS3 = (pS3G*pG) + (pS3P*pP)

pS0
## [1] 0.505
pS1
## [1] 0.165
pS2
## [1] 0.22
pS3
## [1] 0.11
  • (c) The ecologists sample the first quadrat. It has 0 salamanders. What is the probability that the quadrat is good?

Use Bayes Theorem: \[p(B|A) = (p(A|B)*p(B))/p(A)\]

so to substitute for this question: \[p(G|S=0) = (p(S=0|G)*p(G))/p(S=0) = 0.109\]

(pS0G*pG)/pS0
## [1] 0.1089109
  • (d) Given that the quadrat had 0 salamanders, what is the probability that the forest is type A?

what we want is \[p(A|S=0)\]

so use Bayes to solve…

\[p(S=0|A)*p(A) / p(S=0)\]

…however need to solve for

\[p(S=0|A)\]

which is:

\[p(S=0|A) = p(S=0|G)*p(G|A) + p(S=0|P)*p(P|A)\]

so

\[p(A|S=0) = (p(S=0|G)*p(G|A) + p(S=0|P)*p(P|A)) * p(A) / p(S=0) = 0.27772277\]

pAS0 <-  (((pS0G*pGA)+(pS0P*pPA))*pA)/pS0
pAS0
## [1] 0.2772277
  • (e) Now the ecologists prepare to sample the second quadrat. Given the results from the first quadrat, what is the probability that the second quadrat is good?

Updating our priors! MIND BLOWN

So, we now we have new information from part d that says there is a 27% chance we are now in Forest A. So we can take pA but update it’s probability with p(S=0|A).

\[p(A) = P(A|S=0)\]

AND correspondingly we must update the probability for Forest B

\[p(B) = 1-P(A|S=0)\]

so

\[p(Q2=G) = p(G|A)*p(A|S=0) + (p(G|B)*(1-p(A|S=0)) = 0.438\]

pQ2G = (pGA*pAS0)+(pGB*(1-pAS0))
pQ2G
## [1] 0.4386139
  • (f) Given the results from the first quadrat, what is the probability that they find no salamanders in the second quadrat?

Similar to the process above, we take our initial equation

\[p(S=0) = (p(S=0|G)*p(G) + (p(S=0|P)*p(P)\]

but now we update p(G) to reflect the results of (e), so

\[p(Q2S=0) = (p(S=0|G)*p(Q2=G) + ((p(S=0|P)*(1-p(Q2=G)) = 0.605\]

pQ2S0 = (pS0G*pQ2G)+(pS0P*(1-pQ2G))
pQ2S0
## [1] 0.6052475

Lavine 1.10 Problem 44.

This exercise is based on a computer lab that another professor uses to teach the Central Limit Theorem. It was originally written in MATLAB but here it’s translated into R. Enter the following R commands:

u <- matrix ( runif(250000), 1000, 250 )
y <- apply ( u, 2, mean )

These create a 1000 x 250 (a thousand rows and two hundred fifty columns) matrix of random draws, called u and a 250-dimensional vector y which contains the means of each column of U. Now enter the command hist(u[,1]). This command takes the first column of u (a column vector with 1000 entries) and makes a histogram. Print out this histogram and describe what it looks like. What distribution is the runif command drawing from? Now enter the command hist(y). This command makes a histogram from the vector y. Print out this histogram. Describe what it looks like and how it differs from the one above. Based on the histogram, what distribution do you think y follows? You generated y and u with the same random draws, so how can they have different distributions? What’s going on here?

u   <- matrix(runif(250000),1000,   250)                        
y   <- apply(u,2,mean)

hist(u[,1])

hist(y)

The runif function produces a random uniform distibution of numbers from 0-1. The first histogram is a classic uniform fistribution, where all occurences, or bins, occur approximtley equally. The second histogram is a normal, or binomial, distribution. The apply function samples the mean value from each column of u. The histogram is normal because the Central Limit Theorem establishes that the normalized sum of “u” will (often) be normally distributed even if the original data are not normal. Given a large sample size from a population, the mean of the samples will be approximately equal to the mean of the original population, and will tend to be a normal distribution.