Compiled as a final-year zoology student at the University of Edinburgh, based on the information given in lectures.
Types of distribution
Normal distribution ('bell curve') defined
by mean m and standard deviation s occurs when many factors contribute to a variable; used for
continuous variables such as body size. Sample
means are normally distributed (central limit theorem). Standard normal
distribution (z-scores) has m = 0 and s = 1.
Binomial distribution (the 'coin toss
distribution') is defined by p (probability of one outcome over the other) and sample
size n (the 'binomial denominator'); used for ratios/proportions.
Poisson distribution ('falling bombs') defined
entirely by mean; used for counts. In a Poisson process, probability of an
event is independent of probability of previous events. Can be used to work out
the number of unknown items.
Negative binomial distribution defined
by mean and index of clumpiness; used for clumped counts.
Types of test
Assumptions of parametric tests (in
decreasing order of importance): random sampling, independence, homogeneity of
Assumptions of independence,
homogeneity of variance and normality apply to residuals.
If assumptions are met, no
trend should be obvious when residuals (or raw data) are plotted against order
of measurement, explanatory variables, or fitted/predicted values.
Heterogeneity of variance leads to
excessive type 1 errors (false positives), especially if there are abnormally
large variances in a group, or if sample sizes vary between groups.
Heterogeneity of variance is not a problem if it is just sensible variation, if
it is due to a single small variance, or if the effect is not significant
Heterogeneity of variance
tested for with: Bartlett's Test, Cochrane's Test (preferred), Fowler and Cohen 'quick and
dirty' ratio of largest : smallest variance (tested
with Fmax tables).
Assumptions of non-parametric tests:
random sampling, independence, samples come from the same distribution.
Non-parametric tests are safer
with small samples, and work with ranks, but offer limited ability to control
for confounding variables and deal with multiple factors, and do not allow
calculation of confidence intervals etc.
Observational studies: main aim is to
avoid bias. Sampling can be systematic or random.
Stratified sampling (random
sampling within subgroups) aims to take out noise (so strata should be as
homogenous as possible). To generalise, it is necessary to know how population
and sample are distributed across strata.
Experimental manipulations of X can
prove causality, but are not always feasible, need to be set in an
observational context, and usually involve artificial situations from which it
is hard to generalise.
Key features of good
experiments: proper controls, proper randomisation, proper
Measurements are the numbers recorded; units are the independent entities
being used; factors are the
variables being manipulated (which often have levels).
Fully factorial design tests every
possible combination of factors: good for testing interactions, and makes good
statistical analysis easy, but not always feasible. Alternatives include split-plot (nested) designs or Latin squares.
A factor is nested when its levels are not
comparable across levels of another factor (e.g. individual 1 in treatment
block 1 is not related to individual 1 in other blocks).
measures can be dealt with by calculating a summary measure for each
independent unit (e.g. difference score, mean result, final result, subsequent
test, rate of change, fit curves).
Randomised block design helps eliminate
noise that can't be excluded from experiment. Treatments are randomly allocated
within blocks; comparisons are made between treatments within blocks.
Interactions: the importance of one
factor depending on another factor. Indicated by non-parallel lines on plots.
Replicates should be
(non-independent replicates) can be due to: time, space, same stimulus, same
individual, genealogy or phylogeny. Can be spotted by ludicrously high n or
Interpretation of results
Correlation can be due to:
causation, reverse causation, third variables, or chance.
Causation investigated by: first
principles, controlling for confounding variables, experimental manipulation.
Standard error of the mean (s/Ön) is a measure of the reliability of an estimate of the mean.
95% confidence limits = sample mean ±
1.96 × standard error of mean. For small samples, normal approximation breaks
down; Student's t-distribution used instead: 95% C.I. = sample mean ± t.05[d.f.] × S.E.M.
A test is one-tailed if a 'significant' value in one direction is
non-significant; requires halving of p-value.
Degrees of freedom = number of
measurements minus number of parameters estimated from data = n - p
Statistical significance is not
the same as biological significance.
Transformation of data
Transformation of data may deal
with heterogeneity of variances and non-normality.
Standard error and confidence
intervals can be back-transformed after analysis if appropriate.
Arcsin transformation (angular
transformation) is used on proportions (bounded at 0 and 1); it finds sin-1Öp
Square root transformation (ÖX) used on counts close to 0, or Poisson variables.
Log transformation used when data skewed
to right (e.g. growth, size), when variance increases with mean (e.g. large
counts, Poisson variables), ratios (range from 0 to infinity). Log(0) doesn't exist, so it may be necessary to use log(X+1).
Log10 and loge have identical effects.
Box-Cox transformation is a family of
curves providing a general catch-all.
Quick and dirty transformations:
1/ÖX, ÖX, log(X) or 1/X for right skews; X2,
X3, etc for left skews.
Testing if two samples differ
T-tests can be used on small
data sets (n < 30). Null hypothesis is that there is no difference.
2-sample t-test compares means of two groups:
d.f. = n1 + n2 - 2, t = mean
difference / S.E. of mean difference.
Paired t-test compares paired data, d.f. = n - 1.
1-sample t-test compares sample to
predicted mean: t = (sample mean - m) / S.E.M., d.f. = n - 1.
Mann-Whitney U test converts data into
ranks and compares medians of two groups: null hypothesis = no difference, test
statistic is U (or W).
Wilcoxon Matched Pairs test converts
paired data into ranks and compares medians of pairs: test statistic is T.
Sign test looks at sign of differences
(+ or -), null hypothesis = centred on zero.
Testing if two variables are related
Correlation - parametric (Pearson's) or
non-parametric (Spearman's rank). Correlation
coefficient ranges from r = 1 (perfect positive correlation) to r = -1 (perfect
negative correlation); p-value gives probability that a correlation has arisen
by chance (d.f. = n - 1).
Least squares regression - parametric;
gives a line of best fit allowing dependent variable to be predicted from
independent one; t-values and p-values tell us if m (gradient of line) and c
(intercept) differ significantly from 0.
Testing if two variables are independent
These tests are used for
analysing frequencies or counts, or comparing observed against expected results
('goodness of fit' tests).
Chi-squared test: c2 = S( (Oi-Ei)2/Ei
), d.f. = (rows-1)×(columns-1), expected values should be >5.
G-test: better, especially with small
Fisher's exact test: better for expected values between 0 and 5.
Analysis of Variance (ANOVA)
Sum of squares (sum of squared
deviations from a mean) is partitioned into Sample SS (based on deviations of
group means from grand mean) and Error SS (based on deviations around separate
= n - 1 = sample d.f. + error d.f.
= number of groups - 1.
Mean square (SS/d.f.) is independent of
n. Mean squares are not additive
(Sample MS + Error MS ¹ Total
Test statistic: F-ratio = Sample MS / Error MS; d.f. = numerator, denominator.
Two-way ANOVA: Total SS =
Sample1 SS + Sample2 SS + Interaction SS + Error SS.
If interaction is significant,
both factors must be important, even if main effects are not significant on
their own. If interaction is not significant, it may be dropped from model.
Interactions can only be tested with replication.
Adjusted SS, unlike sequential SS, does not depend upon order in which
terms are added to model. It is important when experimental design is not
orthogonal (fully cross-factored and balanced).
ANOVA is a parametric test: it
assumes random sampling, independence, homogeneity of variance, normality.
Directional heterogeneity tests have
alternative hypothesis mA < mB <
mC etc rather than mA ¹ mB ¹ mC;
based on rsPc test statistic where rs is rank
correlation between observed and expected and Pc is p-value from
Response variable is (in decreasing
order of importance): variable to be predicted, variable theory explains,
variable with most error (if correlation is all that's of interest).
Choice of response variable
does not affect statistical significance, correlation coefficient, or direction
of slope, but does affect slope and intercept of regression line.
Least squares regression minimises
'least squares' (distance of points from regression line). It assumes variation
is in y axis only.
Analysis of covariance (ANCOVA) seeks to
explain a dependent variable in terms of a discrete and a continuous variable.
It fits separate lines to each group; non-parallel lines indicate an
General Linear Models
ANOVAs are a particular case of
General Linear Model. Other GLMs can handle: unbalanced designs, more complex
designs, combinations of continuous and discrete
predictors, other error structures.
To construct a GLM: start with
maximal model (within reason), test assumptions, throw out non-significant
highest-order interactions, reanalyse, throw out
non-significant lower-order terms (if they aren't part of significant higher
order terms), reanalyse, until minimal model has been reached. Re-test
Type I error (false positive): null
hypothesis wrongly rejected. Probability of Type I error = a = p-value.
Type II error (false negative): null
hypothesis wrongly accepted. Probability of Type II error = b, which can't be
specified in advance. a and b trade off against each
Liberal tests have high Type I
error rates; conservative tests have high Type II error rates.
Power = 1 - b = probability of detecting
an effect when it is actually there (not
making a Type II error).
Power determined by: size of
the effect, variability in the data (due to natural variability or sampling
Power calculations can be used
in planning, to determine sample size required for given level of power.
Confidence intervals can tells
us how big an effect could be and still generate a null result.
Multiplicity is the problem that the
chance of making a mistake is heightened when multiple test are run:
probability of at least one mistake = 1 - (1-a)n.
Multiplicity can be avoided by:
asking focused questions, using large models where possible (e.g. ANOVA not
multiple t-tests), avoiding fishing expeditions, distinguishing between a priori tests (inspired by theory) and post hoc tests (inspired by data), using
statistical procedures such as Bonferroni adjustment (a=0.05/n).
More notes and essays