Lavine. 1.10. Problem 2. Simulating Dice Rolls.
diceRoll <- sample(1:6, size = 6000, replace = TRUE) # Create a dice and roll it 6000 times. Sample function takes a random sample from each roll
table(diceRoll) # lets see what it looks like
## diceRoll
## 1 2 3 4 5 6
## 985 1045 997 1000 981 992
### this bit of code runs in R but won't Knit in RMarkdown at the moment..###
#count(diceRoll == 1) # Count funtion produces a datarame with the frequency of T/F for the parameters
#count(diceRoll == 2)
#count(diceRoll == 3)
#count(diceRoll == 4)
#count(diceRoll == 5)
#count(diceRoll == 6)
table(diceRoll) # they were close but there is some variabily, ~100
## diceRoll
## 1 2 3 4 5 6
## 985 1045 997 1000 981 992
table(diceRoll)/length(diceRoll)*100 # convert to percent
## diceRoll
## 1 2 3 4 5 6
## 16.41667 17.41667 16.61667 16.66667 16.35000 16.53333
#Tough question. I referenced Lavine 1.7 Simulation chapter where he simulates Craps
sim.diceRolls <- function() { # create a funtion called sim.diceRolls that rolls the dice 6000x and gives a result
dice <- sample(1:6, 6000, replace = T) # create a dice, just like above. Roll it 6000 times and take a random sample from each roll, adding the results to the variable dice
if (sum(dice == 1) >1029) # if the sum of the number of occurences of dice side number 1 (arbitrary) is greater than 1029, then...
Results <- 1 #... add a 1 to the results
else
Results <- 0 # if it isn't >1030, than add 0, or nothing
return(Results)
}
n <- 1000 # now lets simulate our function 1000x
Results = 0 # start with 0 value in results, because we want to recursively popuale this value with each iteration
for (i in 1:n) # a forloop for n iterations
Results <- Results + sim.diceRolls() #recursively add to results each time we the previous value of results plus that which the function sim.diceRolls produces
print(Results/n)*100 # percent of time that you expect to get more than 1030 1s
## [1] 0.162
## [1] 16.2
Lavine 1.10 Problem 37. Solve analytically or by simulation.
Ecologists are studying salamanders in a forest. There are two types of forest. Type A is conducive to salamanders while type B is not. They are studying one forest but don’t know which type it is. Types A and B are equally likely. During the study, they randomly sample quadrats. (A quadrat is a square-meter plot.) In each quadrat they count the number of salamanders. Some quadrats have poor salamander habitat. In those quadrats the number of salamanders is 0. Other quadrats have good salamander habitat. In those quadrats the number of salamanders is either 0, 1, 2, or 3, with probabilities 0.1, 0.3, 0.4, and 0.2, respectively. (Yes, there might be no salamanders in a quadrat with good habitat.) In a type A forest, the probability that a quadrat is good is 0.8 and the probability that it is poor is 0.2. In a type B forest the probability that a quadrat is good is 0.3 and the probability that it is poor is 0.7.
What we know:
\[p(A) = 0.5\] \[p(B) = 0.5\]
\[p(G|A) = 0.8\] \[p(G|B) = 0.3\] \[p(P|A) = 0.2\] \[p(P|A) = 0.7\]
\[p(S=0|P) = 1\] \[p(S=1|P) = 0\] \[p(S=2|P) = 0\] \[p(S=3|P) = 0\]
\[p(S=0|G) = 0.1\] \[p(S=1|G) = 0.3\] \[p(S=2|G) = 0.4\] \[p(S=3|G) = 0.2\]
where: \[S = Salamanders\] \[A = Forest A\] \[B = Forest B\] \[G = Good Habitat\] \[P = Poor Habitat\]
Now we create a table of conditional probability
| Good Habitat | Poor Habitat | Marginal
---------|---------------|---------------|--------------
Forest A | 0.8*0.5 = 0.4| 0.2*0.5 = =0.1| 0.4+0.1 = 0.5
Forest B | 0.3*0.5 = 0.15| 0.7*0.5 = 0.35|0.15+0.35 = 0.5
Marginal |0.4+0.15 = 0.55|0.1+0.35 = 0.45| 0.5+0.5 = 1
see marginal (sum) of Good Habitat 0.55
See r script below for answers
\[p(S=0) = (p(S=0|G)*p(G) + (p(S=0|P)*p(P) = 0.505\] \[p(S=1) = (p(S=1|G)*p(G) + (p(S=1|P)*p(P) = 0.165\] \[p(S=2) = (p(S=2|G)*p(G) + (p(S=2|P)*p(P) = 0.22\] \[p(S=3) = (p(S=3|G)*p(G) + (p(S=3|P)*p(P) = 0.11\]
pA = 0.5
pB = 0.5
pGA = 0.8
pGB = 0.3
pPA = 0.2
pPB = 0.7
pS0P = 1
pS1P = 0
pS2P = 0
pS3P = 0
pS0G = 0.1
pS1G = 0.3
pS2G = 0.4
pS3G = 0.2
pP = 0.45
pG = 0.55
pS0 = (pS0G*pG) + (pS0P*pP)
pS1 = (pS1G*pG) + (pS1P*pP)
pS2 = (pS2G*pG) + (pS2P*pP)
pS3 = (pS3G*pG) + (pS3P*pP)
pS0
## [1] 0.505
pS1
## [1] 0.165
pS2
## [1] 0.22
pS3
## [1] 0.11
Use Bayes Theorem: \[p(B|A) = (p(A|B)*p(B))/p(A)\]
so to substitute for this question: \[p(G|S=0) = (p(S=0|G)*p(G))/p(S=0) = 0.109\]
(pS0G*pG)/pS0
## [1] 0.1089109
what we want is \[p(A|S=0)\]
so use Bayes to solve…
\[p(S=0|A)*p(A) / p(S=0)\]
…however need to solve for
\[p(S=0|A)\]
which is:
\[p(S=0|A) = p(S=0|G)*p(G|A) + p(S=0|P)*p(P|A)\]
so
\[p(A|S=0) = (p(S=0|G)*p(G|A) + p(S=0|P)*p(P|A)) * p(A) / p(S=0) = 0.27772277\]
pAS0 <- (((pS0G*pGA)+(pS0P*pPA))*pA)/pS0
pAS0
## [1] 0.2772277
Updating our priors! MIND BLOWN
So, we now we have new information from part d that says there is a 27% chance we are now in Forest A. So we can take pA but update it’s probability with p(S=0|A).
\[p(A) = P(A|S=0)\]
AND correspondingly we must update the probability for Forest B
\[p(B) = 1-P(A|S=0)\]
so
\[p(Q2=G) = p(G|A)*p(A|S=0) + (p(G|B)*(1-p(A|S=0)) = 0.438\]
pQ2G = (pGA*pAS0)+(pGB*(1-pAS0))
pQ2G
## [1] 0.4386139
Similar to the process above, we take our initial equation
\[p(S=0) = (p(S=0|G)*p(G) + (p(S=0|P)*p(P)\]
but now we update p(G) to reflect the results of (e), so
\[p(Q2S=0) = (p(S=0|G)*p(Q2=G) + ((p(S=0|P)*(1-p(Q2=G)) = 0.605\]
pQ2S0 = (pS0G*pQ2G)+(pS0P*(1-pQ2G))
pQ2S0
## [1] 0.6052475
Lavine 1.10 Problem 44.
This exercise is based on a computer lab that another professor uses to teach the Central Limit Theorem. It was originally written in MATLAB but here it’s translated into R. Enter the following R commands:
u <- matrix ( runif(250000), 1000, 250 )
y <- apply ( u, 2, mean )
These create a 1000 x 250 (a thousand rows and two hundred fifty columns) matrix of random draws, called u and a 250-dimensional vector y which contains the means of each column of U. Now enter the command hist(u[,1]). This command takes the first column of u (a column vector with 1000 entries) and makes a histogram. Print out this histogram and describe what it looks like. What distribution is the runif command drawing from? Now enter the command hist(y). This command makes a histogram from the vector y. Print out this histogram. Describe what it looks like and how it differs from the one above. Based on the histogram, what distribution do you think y follows? You generated y and u with the same random draws, so how can they have different distributions? What’s going on here?
u <- matrix(runif(250000),1000, 250)
y <- apply(u,2,mean)
hist(u[,1])
hist(y)
The runif function produces a random uniform distibution of numbers from 0-1. The first histogram is a classic uniform fistribution, where all occurences, or bins, occur approximtley equally. The second histogram is a normal, or binomial, distribution. The apply function samples the mean value from each column of u. The histogram is normal because the Central Limit Theorem establishes that the normalized sum of “u” will (often) be normally distributed even if the original data are not normal. Given a large sample size from a population, the mean of the samples will be approximately equal to the mean of the original population, and will tend to be a normal distribution.