If you’ve opened the accompanying R file, please close it now :)
Our main example in this lab will be a collection of coin flips, modeled using the binomial distribution:
\[y \sim \text{Binomial}(N, \theta)\]
y is the number of heads, N is the number of flips, and \(\theta\) is the probability of heads. We’d like to estimate \(\theta\). Kruschke Chapter 6 is a good reference for the math here (though his example uses the Bernoulli distribution).
A good prior distribution for \(\theta\) in the binomial distribution is a beta distribution. One reason is that the beta distribution is continuous bounded between 0 and 1, like \(\theta\) should be. There’s another reason, which we’ll introduce below.
# hyperparameters
a_prior <- 1
b_prior <- 1
# plot prior from dbeta
Let’s say we flip the coin 10 times and get 5 heads:
# data
trials <- 10
successes <- 5
# plot likelihood from dbinom
For simple problems, one way to get to a posterior is to approximate the shape of it with a grid.
# grid approximation ----
# set up a probability grid
# generate prior values at points on the grid
# generate likelihood values at points on the grid
# multiply the likelihood by the prior at each point on the grid
# standardize the posterior to sum to 1
# plot the standardized posterior
You can sample from the grid (with replacement!) using the posterior probabilities to compute summary quantities.
# sample from posterior
set.seed(20190425)
nsims <- 1e4
Stop! Don’t look below until we’ve done the grid approximation.
The problem above turns out to be something we can solve with math. We can use math because the beta distribution is the conjugate prior for the binomial likelihood.
What does that mean? A prior and likelihood are conjugate distributions if the posterior distribution comes from the same family as the prior.
In this case, the posterior distribution is also a beta distribution!
Remember the parameter values for our prior:
# analytic solution ----
a_prior <- 1
b_prior <- 1
We can think of the prior in terms of previously observed data: 1 head, 1 tail.
When we see new data, we update those values accordingly. These were our new data:
trials <- 10
successes <- 5
We add the number of successes to a
, and the number of failures to b
:
failures <- trials - successes
a_posterior <- a_prior + successes
b_posterior <- b_prior + failures
Then, we can plot the new beta distribution:
# plot the analytic posterior distribution
What happens if we do a second set of coin flips? We can take our previous data into account by starting with our old posterior as a new prior.
# make some new data
# update the parameters
# plot the new posterior
The jags/
and stan/
folders in this project contain models you’ve seen before. These are models for a Bernoulli distribution: repeated observations of a single coin flip, not a collection of coin flips.
What we want to do is make changes to those models so we can use them on the binomial data introduced above (N = 10, y = 5). Ultimately, the following code should work:
library(rstan)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)
m_binomial <- stan_model("stan/binomial.stan")
d <- list(
N = 10,
y = 5,
a = 1,
b = 1
)
fit_binomial <- sampling(m_binomial, data = d)
Think back to Problem 4 of Assignment 1: Flat priors aren’t really uninformative. Can we use the logit transformation to assign a weakly informative prior somehow? How might we go about doing that?
Here’s a hint: what’s different about the binomial
and binomial_logit
functions in Stan?
rstan::lookup("^binomial")
## StanFunction Arguments ReturnType Page
## 52 binomial_cdf (ints n, ints N, reals theta) real 97
## 53 binomial_coefficient_log (real x, real y) real 35
## 54 binomial_lccdf (ints n | ints N, reals theta) real 98
## 55 binomial_lcdf (ints n | ints N, reals theta) real 98
## 56 binomial_logit_lpmf (ints n | ints N, reals alpha) real 99
## 57 binomial_logit ~ real 99
## 58 binomial_lpmf (ints n | ints N, reals theta) real 97
## 59 binomial ~ real 97
## 60 binomial_rng (ints N, reals theta) R 98
If they’re so convenient, why don’t we use analytic solutions and conjugate priors all the time? To answer that question, think about the normal distribution.
Remember, the normal distribution can be parameterized in 3 ways:
With the standard deviation (aka scale):
\[y \sim \text{Normal}(\mu, \sigma)\]
With the variance:
\[y \sim \text{Normal}(\mu, \sigma^2)\]
With the precision, \(\tau = \frac{1}{\sigma^2}\):
\[y \sim \text{Normal}(\mu, \tau)\]
The key difference from the binomial or Bernoulli distribution is that the normal distribution has 2 parameters we might want to estimate.
If you assume the scale/variance/precision is known, then the only parameter you have to worry about is the mean. In that case, the conjugate prior is also a normal distribution! You can think about the posterior mean as a weighted average of the mean of the data and the prior mean. (Weighted how? Using the precisions of the prior and the sample!)
See this page for different conjugate priors for the normal distribution, depending on which parameters you hold fixed or let vary: https://en.wikipedia.org/wiki/Conjugate_prior#When_likelihood_function_is_a_continuous_distribution
The complete code for this lab is contained in 03-priors.R
. Try to do the activities first before you look at it!
sessionInfo()
## R version 3.5.3 (2019-03-11)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.2
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.0 compiler_3.5.3 pillar_1.2.3
## [4] plyr_1.8.4 bindr_0.1.1 prettyunits_1.0.2
## [7] tools_3.5.3 digest_0.6.18 pkgbuild_1.0.3
## [10] evaluate_0.10.1 tibble_1.4.2 gtable_0.2.0
## [13] pkgconfig_2.0.1 rlang_0.3.0.1 cli_1.0.0
## [16] parallel_3.5.3 yaml_2.1.19 loo_2.1.0
## [19] bindrcpp_0.2.2 gridExtra_2.3 stringr_1.3.1
## [22] dplyr_0.7.5 knitr_1.20 tidyselect_0.2.4
## [25] stats4_3.5.3 rprojroot_1.3-2 grid_3.5.3
## [28] glue_1.3.1 inline_0.3.15 R6_2.3.0
## [31] processx_3.1.0 rmarkdown_1.10 rstan_2.18.2
## [34] purrr_0.2.5 callr_2.0.4 ggplot2_3.1.0
## [37] magrittr_1.5 matrixStats_0.53.1 backports_1.1.2
## [40] scales_1.0.0 StanHeaders_2.18.1 htmltools_0.3.6
## [43] assertthat_0.2.0 colorspace_1.4-0 stringi_1.2.3
## [46] lazyeval_0.2.1 munsell_0.5.0 crayon_1.3.4