helpful tips on sampling from beta distributions (getting from a to b to means and modes)….

beta_mod95_alp40

beta_mod95_alp20

beta_mu95_alp40

I’ve been playing around with using R for simulations recently, and I ran into a use case that required I sample from a skewed distribution with min and max values of 0 and 1.  A beta distribution was a nice representative form (http://en.wikipedia.org/wiki/Beta_distribution#Mode), and base R has a built in rbeta function; however, I needed to be able to feed into rbeta the proper alpha and beta parameters to produce the required distribution.  So, I created two simple R functions that allowed me to start with an alpha value (take your pick) and either a target mean or mode, and the function produced the appropriate beta that I could then use to produce a beta distribution and my random sample.

# first by mean
a1 <- 20
mu1 <- .95
solmean <- function(alpha, mean) (alpha-(mean*alpha))/mean
beta <- solmean(a1, mu1)
x1 <- (rbeta(1000, shape1=a1, shape2=beta)) # sample of n=1000 from the beta dist
head(x1)
x1 <- round(x1,3)
hist(x1, breaks=100)# let's give it a look
mean(x1)
x2 <- table(x1)
which.max(x2) # let's see what the mode is
quantile(x1) # more descriptives

#  Now by mode
a2 <- 20
mo1 <- .95
solmode <- function(alpha, mode) (alpha-1-(mode*alpha)+(2*mode))/mode
beta <- solmode(a2, mo1)
x1 <- (rbeta(1000, shape1=a2, shape2=beta)) # sample of n=1000 from the beta dist
head(x1)
x1 <- round(x1,3)
hist(x1, breaks=100)# let's give it a look
mean(x1)
x2 <- table(x1)
which.max(x2) # let's see what the mode is
quantile(x1) # more descriptives


## Here's the code I used to make the plots that are above:
set.seed(123)
#  Now by mode
a2 <- 20
mo1 <- .95
solmode <- function(alpha, mode) (alpha-1-(mode*alpha)+(2*mode))/mode
beta <- solmode(a2, mo1)
x1 <- (rbeta(1000, shape1=a2, shape2=beta)) # sample of n=1000 from the beta dist
head(x1)
x1 <- round(x1,3)
xlimits <- pretty(range(x1))[c(1,length(pretty(range(x1))))]
hist(x1, breaks=100, xlim=xlimits, main="")# let's give it a look
test <- (x1, breaks=100, xlim=xlimits, main="")
u <- par("usr")
rect(u[1], u[3], u[2], u[4], col = "gray88", border = "black")
abline(v = pretty((range(x1))), col='white') ## draw h lines
abline(h = pretty(test$counts), col='white') ## draw h lines
par(new=T)
hist(x1, breaks=100, xlim=xlimits, col='darkgrey', main=paste('Beta Distr. with mode=', mo1, 'and alpha=', a2))# let's give it a look

#  Now by mode
a2 <- 40
mo1 <- .95
solmode <- function(alpha, mode) (alpha-1-(mode*alpha)+(2*mode))/mode
beta <- solmode(a2, mo1)
x2 <- (rbeta(1000, shape1=a2, shape2=beta)) # sample of n=1000 from the beta dist
head(x2)
x2 <- round(x2,3)
hist(x2, breaks=50, xlim=xlimits, main="")# let's give it a look
test <- hist(x2, breaks=50, xlim=xlimits, main="")
u <- par("usr")
rect(u[1], u[3], u[2], u[4], col = "gray88", border = "black")
abline(v = pretty((range(x1))), col='white') ## draw h lines
abline(h = pretty(test$counts), col='white') ## draw h lines
par(new=T)
hist(x2, breaks=50, xlim=xlimits, col='darkgrey', main=paste('Beta Distr. with mode=', mo1, 'and alpha=', a2))# let's give it a look

#  Now by mean
a3 <- 40
mu1 <- .95
solmean <- function(alpha, mean) (alpha-(mean*alpha))/mean
beta <- solmean(a3, mu1)
x3 <- (rbeta(1000, shape1=a1, shape2=beta)) # sample of n=1000 from the beta dist
head(x3)
mean(x3)
x3 <- round(x3,3)
hist(x3, breaks=50, xlim=xlimits, main="")# let's give it a look
test <- hist(x3, breaks=50, xlim=xlimits, main="")
u <- par("usr")
rect(u[1], u[3], u[2], u[4], col = "gray88", border = "black")
abline(v = pretty((range(x1))), col='white') ## draw h lines
abline(h = pretty(test$counts), col='white') ## draw h lines
par(new=T)
hist(x3, breaks=50, xlim=xlimits, col='darkgrey', main=paste('Beta Distr. with mean=', mu1, 'and alpha=', a3))# let's give it a look

That’s it. I’ve noticed that changing your prespecified alpha does some interesting things to the kurtosis of the resulting beta distribution, but it’s up to you to play around with things until you get a distribution that suits your needs!

One last thing… if you are really into fat-tailed distributions (let’s say your a black swan fan), and you really don’t require a “curvy” form.  One great way to sample from a skewed distribution that is realtively easy to specify (in terms of a mode–that is), I’ve found that the triangle package (http://cran.r-project.org/web/packages/triangle/index.html) a nice way to go…. and the code is super easy to run.

 require(triange)
 tri <- round(rtriangle(10000, 0, 1, .85),3)  # just feed it the sample size, min bound, max bound, and mode... and voila!
 hist(tri)
 

ENJOY!

Advertisements

Some quick tips on upgrading R

I recently took the plunge to upgrade R to 2.15.  As I was pondering the best way to install all my favorite packages, I stumbled upon this very helpful post from HLPlab

http://hlplab.wordpress.com/2012/06/01/transferring-installed-packages-between-different-installations-of-r/

The post describes a very handy way of ensuring that all your favorite packages make it to your new install.  Basically this is done via a handy set of code, and involves: 1) saving the names of your packages to a location, and then 2) calling R to then install those packages by name into your new version.

The code with descriptions can be found below:

1. Run the script “store_packages.R” in your current version of R.

# store_packages.R # # stores a list of your currently installed packages

tmp = installed.packages()

installedpackages = as.vector(tmp[is.na(tmp[,"Priority"]), "Package"])

save(installedpackages, file="packagenames.rda")

Make sure that all the quotation marks in the script are straight.  The scripts will generate an error if they include any curly quotation marks.  For some reason, when I saved this blog entry, some quotation marks changed to curly ones.  WordPress is probably to blame for this problem, which I have not been able to fix.)

2. Close R.  Open the installation of R that you want the packages to be installed in.

3. Run the script “restore_packages.R”.

# restore_packages.R # # installs each package from the stored list of packages

load(“~/packagenames.rda”)

for (count in 1:length(installedpackages)) install.packages(installedpackages[count])

Note that if you want to install the list of packages in an installation of R on a different computer, you should transfer the .rda file that is created by the store_packages script to that computer, and make sure that the path for the “load” command in the restore_packages script is set to the right location.

However, for me, this all may change with my next upgrade–perhaps, as I chose the option to save my packages to a personal library (when 2.15 gave me the option) and my packages are now stored in a different location:

The downloaded binary packages are in
C:\Users\Chris\AppData\Local\Temp\RtmpiCX9N6\downloaded_packages