This page is about drawing a density histogram and overlaying a theoretical density. We will see how to do these things by doing an example in which we:
Here it is in just 4 lines – one line to load the MASS package (to be able to use truehist
, and then one line for each of the 3 steps listed above:
library(MASS)
mysample <- rexp(n = 100000, rate = 3)
truehist(mysample)
curve(3*exp(-3*x), from = 0, to = 3, add=TRUE, col='red', lwd=1.5)
We generated 100,000 points from the Exp(3) distribution, drew a density histogram in blue, and overlaid the pdf for the Exp(3) distribution in red.
Probably the most convenient way to plot a function of a single variable is with R’s curve
function. You give it a formula to plot, as a function of a variable “x”, and a range to plot over. For example, to plot \(f(x)=\sin(3x+5)\) from \(x=-2\) to \(x=12\):
curve(sin(3*x+5), -2, 12)
Note you need to write it as above even if you are thinking of another variable in your head, like \(f(t)=\sin(3t+5)\), so the following won’t work:
curve(sin(3*t+5), -2, 12)
## Error in curve(sin(3 * t + 5), -2, 12): 'expr' must be a function, or a call or an expression containing 'x'
See, we got an error. curve
is set up so it needs a function of “x”, not “t” or any other letter!
So we can plot the Exp(3) density this way:
curve(3 * exp(-3*x), 0, 5)
We’ll use the rexp
function to sample random draws from an exponential density.
mysample <- rexp(n = 100000, rate = 3)
OK, that was easy. Now we are ready to draw a histogram of the data in mysample
.
The truehist
function in the MASS library gives a nice density histogram:
truehist(mysample)
We use the curve
command with argument add=TRUE
to say that we want to add the curve to the current plot already drawn (the histogram), not start a new plot.
truehist(mysample)
curve(3*exp(-3*x), from = 0, to = 3, add=TRUE)
That’s nice agreement! The density histogram of the randomly distributed data indeed does match the theoretical probability density function closely.
To be able to see the f
curve better, let’s make it red, and also a bit thicker with the argument lwd=1.5
(lwd is for “line width”):
truehist(mysample)
curve(3*exp(-3*x), from = 0, to = 3, add=TRUE, col='red', lwd=1.5)