[Pkg-octave-devel] One- and two-way ANOVA
Pascal A. Dupuis
Pascal.Dupuis at worldonline.be
Thu Mar 6 21:03:50 UTC 2008
Hello,
I'm getting ready to analyse a measurement campaign involving two
factors ... and noticed the Octave implementation of ANOVA can only
deal with one factor. I modified it a bit, to perform one-way of
two-way ANOVA according to the input:
- call with one argument => it must be a matrix, rows are repetitions,
each column is associated with one level of a factor. One-way ANOVA.
- call with two arguments
- both are vectors => data and one set of group. One-way.
- first is a matrix, the second is either a vector whose length is
equal to the number of rows of the matrix, either a matrix of the
same size => each colum is associated with one level of the 'column'
factor, traditionnaly called 'B'. Rows are aggregated according to
levels of the 'row' factor, called 'A'. Two-way ANOVA.
For one-way, computes the Within-group, Bewteen-group and Total
sum-of-squares, with are further scaled to compute corresponding
variances and F test. For two-way, the within-group is split as row
factor, colum factor and interaction. This produces three variances
and F test.
The target is different from MANOVA, which rests upon the hypothesis
that data in one group are samples of a normal distribution with
VECTOR mean and a square variance-covariance matrix.
The working was verified against data published on
1)http://www.itl.nist.gov/div898/handbook/prc/section4/prc438.htm
2)http://faculty.vassar.edu/lowry/ch16pt2.html
Note that in the second exemple given in
http://faculty.vassar.edu/lowry/ch16pt3.html¸there is a small mistake
in the computations, the SST is slightly incorrect.
Would it be possible to test the enclosed file, and replace
/usr/share/octave/3.0.0/m/statistics/tests/anova.m
with it ? As the old behaviour, in the case of one-way, is unmodified,
I guess there shouldn't be a name change. If its OK on Debian, it
should latter go upstream.
Regards
Pascal
--
-------------- next part --------------
## Copyright (C) 1995, 1996, 1997, 1998, 1999, 2000, 2002, 2005, 2006,
## 2007 Kurt Hornik. 2008 Pascal Dupuis
##
## This file is part of Octave.
##
## Octave is free software; you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or (at
## your option) any later version.
##
## Octave is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with Octave; see the file COPYING. If not, see
## <http://www.gnu.org/licenses/>.
## -*- texinfo -*-
## @deftypefn {Function File} {[@var{pval}, @var{f}, @var{df_b}, @var{df_w}] =} anova (@var{y}, @var{g})
## Perform a one-way or two-way analysis of variance (ANOVA). The goal
## is to test whether the population means of data taken from @var{k}
## different groups are all equal.
##
## Data may be given in a single vector @var{y} with groups specified by
## a corresponding vector of group labels @var{g} (e.g., numbers from 1
## to @var{k}). This is the general form which does not impose any
## restriction on the number of data in each group or the group labels.
##
## If @var{y} is a matrix and @var{g} is omitted, each column of @var{y}
## is treated as a group. This form is only appropriate for balanced
## ANOVA in which the numbers of samples from each group are all equal.
##
## Another option is to have @var{y} as a matrix, and @var{g} as a
## vector or matrix of group labels : each column belongs to one level
## of the column factor, and group of rows belongs to one level of the
## row factor. In this case, the "Between Groups" variance is split into
## three components: one for the row factor, one for the column factor
## s, and one for the interactions between the two factors.
##
## Under the null of constant means, the statistic @var{f} follows an F
## distribution with @var{df_b} and @var{df_w} degrees of freedom.
##
## The p-value (1 minus the CDF of this distribution at @var{f}) is
## returned in @var{pval}. For two-way ANOVA, the p-value is a three
## elements vector, with row, column and interaction components.
##
## If no output argument is given, the standard ANOVA table is printed.
## @end deftypefn
## Author: KH <Kurt.Hornik at wu-wien.ac.at>
## Modified: Pascal Dupuis <Pascal.Dupuis at worldonline.be> for two ways ANOVA
## Description: One-way and Tow-Way analysis of variance (ANOVA)
function [pval, f, df_b, df_w] = anova (y, g)
if ((nargin < 1) || (nargin > 2))
print_usage ();
elseif (nargin == 1)
if (isvector (y))
error ("anova: for `anova (y)', y must not be a vector");
endif
[group_count, k] = size (y);
n = group_count * k;
group_mean = mean (y);
anova_type = 1;
else
if (! isvector (y))
anova_type = 2;
c = columns(y);
else
anova_type = 1;
endif
n = numel(y); m = numel(g); h = isvector(g);
if (m != n) && (rows(g(:)) != rows(y)),
error ("anova: g must be at least a vector with length equal to rows(y)");
endif
s = sort (g(:));
i = find (s (2 : m) > s(1 : (m-1)));
k = length (i) + 1;
if (k == 1)
error ("anova: there should be at least 2 groups");
else
group_label = s ([1, (reshape (i, 1, k-1) + 1)]);
endif
if 1 == anova_type,
for i = k:-1:1,
v = y (find (g == group_label (i)));
group_count (i) = length (v);
group_mean (i) = mean (v);
endfor
else
for i = k:-1:1,
if h,
v = y (find (g == group_label (i)), :);
else
v = y (find (g == group_label (i)));
endif
v = reshape(v, numel(v)/c, c);
group_count(i, 1:c) = rows (v);
group_mean (i, :) = mean (v);
row_mean (i, 1) = mean (v(:));
endfor
column_mean = mean(y);
column_count = sum(group_count, 1);
row_count = sum(group_count, 2);
endif
endif
total_mean = mean (y(:));
SST = sumsq (y(:) - total_mean);
SSB = sum(sum (group_count .* (group_mean - total_mean) .^ 2));
SSW = SST - SSB;
if 1 == anova_type,
df_b = k - 1;
df_w = n - k;
v_b = SSB / df_b;
v_w = SSW / df_w;
f = v_b / v_w;
pval = 1 - f_cdf (f, df_b, df_w);
if (nargout == 0)
## This eventually needs to be done more cleanly ...
printf ("\n");
printf ("One-way ANOVA Table:\n");
printf ("\n");
printf ("Source of Variation Sum of Squares df Empirical Var\n");
printf ("*********************************************************\n");
printf ("Between Groups %15.4f %4d %13.4f\n", SSB, df_b, v_b);
printf ("Within Groups %15.4f %4d %13.4f\n", SSW, df_w, v_w);
printf ("---------------------------------------------------------\n");
printf ("Total %15.4f %4d\n", SST, n - 1);
printf ("\n");
printf ("Test Statistic f %15.4f\n", f);
printf ("p-value %15.4f\n", pval);
printf ("\n");
endif
else
SSR = sum (row_count .* (row_mean - total_mean) .^ 2);
SSC = sum (column_count .* (column_mean - total_mean) .^ 2);
SSRC= SSB - SSR - SSC;
df_b = k*c - 1;
df_w = n - k*c;
df_r = k - 1;
df_c = c - 1;
df_rc = (k - 1) * (c - 1);
v_w = SSW / df_w;
v_b = SSB / df_b;
v_r = SSR / df_r;
v_c = SSC / df_c;
v_rc= SSRC/ df_rc;
f = [v_r / v_w; v_c / v_w; v_rc / v_w];
pval = 1 - f_cdf (f, [df_r; df_c; df_rc], df_w);
if (nargout == 0)
## This eventually needs to be done more cleanly ...
printf ("\n");
printf ("Two-way ANOVA Table:\n");
printf ("\n");
printf ("Source of Variation Sum of Squares df Empirical Var\n");
printf ("*********************************************************\n");
printf ("Between Groups %15.4f %4d %13.4f\n", SSB, df_b, v_b);
printf (" Rows %15.4f %4d %13.4f\n", SSR, df_r, v_r);
printf (" Columns %15.4f %4d %13.4f\n", SSC, df_c, v_c);
printf (" Interactions %15.4f %4d %13.4f\n", SSRC,df_rc,v_rc);
printf ("Within Groups %15.4f %4d %13.4f\n", SSW, df_w, v_w);
printf ("---------------------------------------------------------\n");
printf ("Total %15.4f %4d\n", SST, n - 1);
printf ("\n");
printf ("Test Statistic f (rows) %15.4f\n", f(1));
printf ("p-value %15.4f\n", pval(1));
printf ("Test Statistic f (columns) %15.4f\n", f(2));
printf ("p-value %15.4f\n", pval(2));
printf ("Test Statistic f (interactions) %15.4f\n", f(3));
printf ("p-value %15.4f\n", pval(3));
printf ("\n");
else
df_b = [df_r; df_c; df_rc]
endif
endif
endfunction
More information about the Pkg-octave-devel
mailing list