The vegan package provides tools for descriptive community ecology. It has most basic functions of:
In this tutorial, we will briefly explore the breadth of the program as well as dive into basic diversity analysis explore ordination of multivariate datasets.
If you haven’t done so already, please install vegan
#install.packages("vegan")
Consider visiting the vegan documentation to learn about this package.
We will using a few datasets native to vegan
library(vegan)
## Warning: package 'vegan' was built under R version 3.3.3
## Loading required package: permute
## Warning: package 'permute' was built under R version 3.3.3
## Loading required package: lattice
## This is vegan 2.4-3
data(package = "vegan") ## names of data sets in the package
data(dune) # Vegetation and Environment in Dutch Dune Meadows
str(dune) #a data frame of observations of 30 species at 20 sites
## 'data.frame': 20 obs. of 30 variables:
## $ Achimill: num 1 3 0 0 2 2 2 0 0 4 ...
## $ Agrostol: num 0 0 4 8 0 0 0 4 3 0 ...
## $ Airaprae: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Alopgeni: num 0 2 7 2 0 0 0 5 3 0 ...
## $ Anthodor: num 0 0 0 0 4 3 2 0 0 4 ...
## $ Bellpere: num 0 3 2 2 2 0 0 0 0 2 ...
## $ Bromhord: num 0 4 0 3 2 0 2 0 0 4 ...
## $ Chenalbu: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Cirsarve: num 0 0 0 2 0 0 0 0 0 0 ...
## $ Comapalu: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Eleopalu: num 0 0 0 0 0 0 0 4 0 0 ...
## $ Elymrepe: num 4 4 4 4 4 0 0 0 6 0 ...
## $ Empenigr: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Hyporadi: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Juncarti: num 0 0 0 0 0 0 0 4 4 0 ...
## $ Juncbufo: num 0 0 0 0 0 0 2 0 4 0 ...
## $ Lolipere: num 7 5 6 5 2 6 6 4 2 6 ...
## $ Planlanc: num 0 0 0 0 5 5 5 0 0 3 ...
## $ Poaprat : num 4 4 5 4 2 3 4 4 4 4 ...
## $ Poatriv : num 2 7 6 5 6 4 5 4 5 4 ...
## $ Ranuflam: num 0 0 0 0 0 0 0 2 0 0 ...
## $ Rumeacet: num 0 0 0 0 5 6 3 0 2 0 ...
## $ Sagiproc: num 0 0 0 5 0 0 0 2 2 0 ...
## $ Salirepe: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Scorautu: num 0 5 2 2 3 3 3 3 2 3 ...
## $ Trifprat: num 0 0 0 0 2 5 2 0 0 0 ...
## $ Trifrepe: num 0 5 2 1 2 5 2 2 3 6 ...
## $ Vicilath: num 0 0 0 0 0 0 0 0 0 1 ...
## $ Bracruta: num 0 0 2 2 2 6 2 2 2 2 ...
## $ Callcusp: num 0 0 0 0 0 0 0 0 0 0 ...
Explore the utility of the diversity function
diversity(dune,index = "simpson") # calculate Simpson's 1-D Index of Diversity for each site. # closer to 1 = greater diversity
## 1 2 3 4 5 6 7
## 0.7345679 0.8900227 0.8787500 0.9007407 0.9140076 0.9001736 0.9075000
## 8 9 10 11 12 13 14
## 0.9087500 0.9115646 0.9031909 0.8671875 0.8685714 0.8521579 0.8333333
## 15 16 17 18 19 20
## 0.8506616 0.8429752 0.8355556 0.8614540 0.8740895 0.8678460
simpson <- diversity(dune, "simpson") # or assign to var.
simpson
## 1 2 3 4 5 6 7
## 0.7345679 0.8900227 0.8787500 0.9007407 0.9140076 0.9001736 0.9075000
## 8 9 10 11 12 13 14
## 0.9087500 0.9115646 0.9031909 0.8671875 0.8685714 0.8521579 0.8333333
## 15 16 17 18 19 20
## 0.8506616 0.8429752 0.8355556 0.8614540 0.8740895 0.8678460
shannon <- diversity(dune) # note that Shannon's is default
shannon #Typically ranges from 1.5 - 3.4, higher = more diverse
## 1 2 3 4 5 6 7 8
## 1.440482 2.252516 2.193749 2.426779 2.544421 2.345946 2.471733 2.434898
## 9 10 11 12 13 14 15 16
## 2.493568 2.398613 2.106065 2.114495 2.099638 1.863680 1.979309 1.959795
## 17 18 19 20
## 1.876274 2.079387 2.134024 2.048270
# lets compare the two
par(mfrow = c(1, 2)) # use par to generate panels with 1 row of 2 graphs
hist(simpson)
hist(shannon)
Next we can calcuate all of the pair-wise dissimilarity (distance) measures between sites based on their species composition using the function vegdist
Vegdist computes dissimilarity indices. We are using gower and bray-curtis which are good in detecting underlying ecological gradients
Both indices are used to quantify the compositional dissimilarity between two different sites. They are bounded between 0 and 1, where 0 = same composition, 1 = maximally dissimilar.
par(mfrow = c(1, 2))
bray = vegdist(dune, "bray")
gower = vegdist(dune, "gower")
hist(bray, xlim = range(0.0,1.0))
hist(gower, xlim = range(0.0,1.0))
# r allows for multiple iterations of each dissimilarity index to examine #freqeuncy of differences
Dissimilarity analysis is a good way to explore variability in community composition. The next steps would be to do some sort of cluster analysis to see where community associations exist, however we’re going to switch gear now.
Rarefaction is a technique to assess expected species richness.
Rarefaction allows the calculation of species richness for a given number of individual samples, based on the construction of rarefaction curves.
The issue that occurs when sampling various species in a community is that the larger the number of individuals sampled, the more species that will be found. Rarefaction curves are created by randomly re-sampling the pool of N samples multiple times and then plotting the average number of species found in each sample (1,2, … N). “Thus rarefaction generates the expected number of species in a small collection of n individuals (or n samples) drawn at random from the large pool of N samples.”. Rarefaction curves generally grow rapidly at first, as the most common species are found, but the curves plateau as only the rarest species remain to be sampled.
To try some rarefaction, we use the rarefy and rarecurve functions.
spAbund <- rowSums(dune) #gives the number of individuals found in each plot
spAbund # view observations per plot
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## 18 42 40 45 43 48 40 40 42 43 32 35 33 24 23 33 15 27 31 31
raremin <- min(rowSums(dune)) #rarefaction uses the smallest number of observations per sample to extrapolate the expected number if all other samples only had that number of observations
raremin # view smallest # of obs (site 17)
## [1] 15
sRare <- rarefy(dune, raremin) # now use function rarefy
sRare #gives an "expected"rarefied" number of species (not obs) if only 15 individuals were present
## 1 2 3 4 5 6 7 8
## 4.813725 8.294747 7.985897 9.105309 9.787891 8.688685 9.496061 9.317495
## 9 10 11 12 13 14 15 16
## 9.575586 9.027570 7.734350 7.719151 7.782660 6.572498 7.271063 6.961173
## 17 18 19 20
## 7.000000 7.752880 7.891114 7.334569
## attr(,"Subsample")
## [1] 15
rarecurve(dune, col = "blue") # produces rarefaction curves # squares are site numbers positioned at observed space. To "rarefy" a larger site, follow the rarefaction curve until the curve corresponds with the lesser site obs. This gives you rarefied species richness
Now let’s explore some ordination techniques.
Many ordination techniques exist, including principal components analysis (PCA), non-metric multidimensional scaling (NMDS), and correspondence analysis (CA), among others.
Vegan is especially good at NMDS. This tutorial explores this in most detail.
Let’s lay some groundwork:
The goal of NMDS is to collapse information from multiple dimensions (e.g, from multiple communities, sites, etc.) into just a few, so that they can be visualized and interpreted. NMDS does not produce a statistical output - however we could do so (more on this later)
The goal of NMDS is to represent the position of communities in multidimensional space as accurately as possible using a reduced number of dimensions that can be easily visualized (and to spare your brain box).
NMDS does not use the absolute abundances of species in communities, but rather their rank orders and as a result is a flexible technique that accepts a variety of types of data. (It’s also where the “non-metric” part of the name comes from.)
To run the NMDS, we will use the function metaMDS. The function requires only a community-by-species matrix (which we will create randomly).
set.seed(2) # random no. generator / way to specify seeds, 2=no. of integers?
community_matrix=matrix(
sample(1:100,300,replace=T),nrow=10, # counts up to 100, 300 cells
dimnames=list(paste("community",1:10,sep=""),paste("sp",1:30,sep="")))
head(community_matrix)
## sp1 sp2 sp3 sp4 sp5 sp6 sp7 sp8 sp9 sp10 sp11 sp12 sp13 sp14
## community1 19 56 67 2 99 1 78 28 36 98 21 3 4 21
## community2 71 24 39 17 30 2 89 32 68 40 43 74 25 92
## community3 58 77 84 82 12 69 63 5 3 38 99 38 98 3
## community4 17 19 16 87 17 93 27 19 41 57 83 58 89 97
## community5 95 41 35 52 95 28 86 19 20 47 29 83 25 32
## community6 95 86 49 63 80 82 44 76 86 20 60 82 76 67
## sp15 sp16 sp17 sp18 sp19 sp20 sp21 sp22 sp23 sp24 sp25 sp26
## community1 18 19 85 97 95 11 86 98 16 68 90 71
## community2 29 37 95 11 58 41 65 53 94 34 31 9
## community3 63 91 5 26 96 98 61 83 41 90 56 26
## community4 31 40 76 90 79 17 98 68 49 53 76 6
## community5 45 78 30 39 12 51 38 98 69 88 96 74
## community6 74 29 66 80 84 21 82 6 86 26 96 2
## sp27 sp28 sp29 sp30
## community1 76 89 28 26
## community2 6 92 98 32
## community3 63 85 66 75
## community4 36 30 68 50
## community5 81 79 21 71
## community6 68 50 100 42
example_NMDS=metaMDS(community_matrix, # Our community-by-species matrix
k=2) # The number of reduced dimensions. Increase if high stress is problem.
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.1280709
## Run 1 stress 0.1280709
## ... Procrustes: rmse 3.168062e-05 max resid 5.925658e-05
## ... Similar to previous best
## Run 2 stress 0.2630959
## Run 3 stress 0.1424254
## Run 4 stress 0.1778004
## Run 5 stress 0.1280709
## ... New best solution
## ... Procrustes: rmse 8.669277e-06 max resid 1.508905e-05
## ... Similar to previous best
## Run 6 stress 0.1752059
## Run 7 stress 0.1624619
## Run 8 stress 0.1280709
## ... Procrustes: rmse 3.499232e-05 max resid 6.594091e-05
## ... Similar to previous best
## Run 9 stress 0.1624619
## Run 10 stress 0.128071
## ... Procrustes: rmse 7.130361e-05 max resid 0.0001196693
## ... Similar to previous best
## Run 11 stress 0.206671
## Run 12 stress 0.1424254
## Run 13 stress 0.157268
## Run 14 stress 0.2023676
## Run 15 stress 0.1461475
## Run 16 stress 0.1280709
## ... Procrustes: rmse 4.588009e-06 max resid 8.034253e-06
## ... Similar to previous best
## Run 17 stress 0.1280709
## ... New best solution
## ... Procrustes: rmse 6.914407e-06 max resid 1.306718e-05
## ... Similar to previous best
## Run 18 stress 0.2028469
## Run 19 stress 0.1624619
## Run 20 stress 0.1778003
## *** Solution reached
#"The stress, or the disagreement between 2-D configuration and predicted values from the regression"
#A good rule of thumb: stress > 0.05 provides an excellent representation in reduced dimensions, > 0.1 is great, >0.2 is good/ok, and stress > 0.3 provides a poor representation
You should see each iteration of the NMDS until a solution is reached (i.e., stress was minimized after some number of reconfigurations of the points in 2 dimensions).
Now we can plot the NMDS. The plot shows us both the communities (“sites”, open circles) and species (red crosses), but we don’t know which circle corresponds to which site, and which species corresponds to which cross.
plot(example_NMDS)
We can use the function ordiplot and orditorp to add text to the plot in place of points to make some sense of this rather non-intuitive mess.
ordiplot(example_NMDS,type="n") #Ordination plot function especially for congested plots
orditorp(example_NMDS,display="species",col="red",air=0.01) #The function adds text or points to ordination plots
orditorp(example_NMDS,display="sites",cex=1.25,air=0.01)
Let’s suppose that communities 1-5 had some treatment applied, and communities 6-10 a different treatment. Using ordihull we can draw convex hulls connecting the vertices of the points made by these communities on the plot.
This is an intuitive way to understand how communities and species cluster based on treatments.
treat=c(rep("Treatment1",5),rep("Treatment2",5))
ordiplot(example_NMDS,type="n")
ordihull(example_NMDS,groups=treat,draw="polygon",col="grey90",label=F)
orditorp(example_NMDS,display="species",col="red",air=0.01)
orditorp(example_NMDS,display="sites",col=c(rep("green",5),rep("blue",5)),
air=0.01,cex=1.25)
One can also plot “spider graphs” using the function orderspider, ellipses using the function ordiellipse, or a minimum spanning tree (MST) using ordicluster which connects similar communities (useful to see if treatments are effective in controlling community structure).
#spider plot
ordiplot(example_NMDS,type="n")
ordispider(example_NMDS,groups=treat)
orditorp(example_NMDS,display="species",col="red",air=0.01)
orditorp(example_NMDS,display="sites",col=c(rep("green",5),rep("blue",5)),
air=0.01,cex=1.25)
Great for visualization, but no statistical output?
If the treatment is continuous, such as an environmental gradient, then it might be useful to plot contour lines rather than convex hulls. We can simply make up some, say, elevation data for our original community matrix and overlay them onto the NMDS plot using ordisurf:
# Define random elevations for previous example
elevation=runif(10,0.5,1.5)
# Use the function ordisurf to plot contour lines
ordisurf(example_NMDS,elevation,main="",col="forestgreen")
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ s(x1, x2, k = 10, bs = "tp", fx = FALSE)
##
## Estimated degrees of freedom:
## 3.18 total = 4.18
##
## REML score: 4.430011
# Finally, display species on plot
orditorp(example_NMDS,display="species",col="grey30",air=0.1,
cex=1)
Consider a single axis representing the abundance of a single species. Along this axis, we can plot the communities in which this species appears, based on its abundance within each.
plot(0:10,0:10,type="n",axes=F,xlab="Abundance of Species 1",ylab="")
axis(1)
points(5,0); text(5.5,0.5,labels="community A")
points(3,0); text(3.2,0.5,labels="community B")
points(0,0); text(0.8,0.5,labels="community C")
Now consider a second axis of abundance, representing another species. We can now plot each community along the two axes (Species 1 and Species 2).
plot(0:10,0:10,type="n",xlab="Abundance of Species 1",
ylab="Abundance of Species 2")
points(5,5); text(5,4.5,labels="community A")
points(3,3); text(3,3.5,labels="community B")
points(0,5); text(0.8,5.5,labels="community C")
Now consider a third axis of abundance representing yet another species.
# install.packages("scatterplot3d")
library(scatterplot3d) # an R package for the visualization of multivariate data in a three dimensional space.
d<-scatterplot3d(0:10,0:10,0:10,type="n",xlab="Abundance of Species 1",
ylab="Abundance of Species 2",zlab="Abundance of Species 3"); d
## $xyz.convert
## function (x, y = NULL, z = NULL)
## {
## xyz <- xyz.coords(x, y, z)
## if (angle > 2) {
## temp <- xyz$x
## xyz$x <- xyz$y
## xyz$y <- temp
## }
## y <- (xyz$y - y.add)/y.scal
## return(list(x = xyz$x/x.scal + yx.f * y, y = xyz$z/z.scal +
## yz.f * y))
## }
## <environment: 0x000000002bbd1658>
##
## $points3d
## function (x, y = NULL, z = NULL, type = "p", ...)
## {
## xyz <- xyz.coords(x, y, z)
## if (angle > 2) {
## temp <- xyz$x
## xyz$x <- xyz$y
## xyz$y <- temp
## }
## y2 <- (xyz$y - y.add)/y.scal
## x <- xyz$x/x.scal + yx.f * y2
## y <- xyz$z/z.scal + yz.f * y2
## mem.par <- par(mar = mar, usr = usr)
## if (type == "h") {
## y2 <- z.min + yz.f * y2
## segments(x, y, x, y2, ...)
## points(x, y, type = "p", ...)
## }
## else points(x, y, type = type, ...)
## }
## <environment: 0x000000002bbd1658>
##
## $plane3d
## function (Intercept, x.coef = NULL, y.coef = NULL, lty = "dashed",
## lty.box = NULL, draw_lines = TRUE, draw_polygon = FALSE,
## polygon_args = list(border = NA, col = rgb(0, 0, 0, 0.2)),
## ...)
## {
## if (!is.atomic(Intercept) && !is.null(coef(Intercept))) {
## Intercept <- coef(Intercept)
## if (!("(Intercept)" %in% names(Intercept)))
## Intercept <- c(0, Intercept)
## }
## if (is.null(lty.box))
## lty.box <- lty
## if (is.null(x.coef) && length(Intercept) == 3) {
## x.coef <- Intercept[if (angle > 2)
## 3
## else 2]
## y.coef <- Intercept[if (angle > 2)
## 2
## else 3]
## Intercept <- Intercept[1]
## }
## mem.par <- par(mar = mar, usr = usr)
## x <- x.min:x.max
## y <- 0:y.max
## ltya <- c(lty.box, rep(lty, length(x) - 2), lty.box)
## x.coef <- x.coef * x.scal
## z1 <- (Intercept + x * x.coef + y.add * y.coef)/z.scal
## z2 <- (Intercept + x * x.coef + (y.max * y.scal + y.add) *
## y.coef)/z.scal
## if (draw_polygon)
## do.call("polygon", c(list(c(x.min, x.min + y.max * yx.f,
## x.max + y.max * yx.f, x.max), c(z1[1], z2[1] + yz.f *
## y.max, z2[length(z2)] + yz.f * y.max, z1[length(z1)])),
## polygon_args))
## if (draw_lines)
## segments(x, z1, x + y.max * yx.f, z2 + yz.f * y.max,
## lty = ltya, ...)
## ltya <- c(lty.box, rep(lty, length(y) - 2), lty.box)
## y.coef <- (y * y.scal + y.add) * y.coef
## z1 <- (Intercept + x.min * x.coef + y.coef)/z.scal
## z2 <- (Intercept + x.max * x.coef + y.coef)/z.scal
## if (draw_lines)
## segments(x.min + y * yx.f, z1 + y * yz.f, x.max + y *
## yx.f, z2 + y * yz.f, lty = ltya, ...)
## }
## <environment: 0x000000002bbd1658>
##
## $box3d
## function (...)
## {
## mem.par <- par(mar = mar, usr = usr)
## lines(c(x.min, x.max), c(z.max, z.max), ...)
## lines(c(0, y.max * yx.f) + x.max, c(0, y.max * yz.f) + z.max,
## ...)
## lines(c(0, y.max * yx.f) + x.min, c(0, y.max * yz.f) + z.max,
## ...)
## lines(c(x.max, x.max), c(z.min, z.max), ...)
## lines(c(x.min, x.min), c(z.min, z.max), ...)
## lines(c(x.min, x.max), c(z.min, z.min), ...)
## }
## <environment: 0x000000002bbd1658>
##
## $par.mar
## $par.mar$mar
## [1] 5.1 4.1 4.1 2.1
d$points3d(5,5,0); text(d$xyz.convert(5,5,0.5),labels="community A")
d$points3d(3,3,3); text(d$xyz.convert(3,3,3.5),labels="community B")
d$points3d(0,5,5); text(d$xyz.convert(0,5,5.5),labels="community C")