[Pkg-octave-devel] One- and two-way ANOVA

Pascal A. Dupuis Pascal.Dupuis at worldonline.be
Thu Mar 6 21:03:50 UTC 2008


Hello,

I'm getting ready to analyse a measurement campaign involving two
factors ... and noticed the Octave implementation of ANOVA can only
deal with one factor. I modified it a bit, to perform one-way of
two-way ANOVA according to the input:
- call with one argument => it must be a matrix, rows are repetitions,
each column is associated with one level of a factor. One-way ANOVA.
- call with two arguments
  - both are vectors => data and one set of group. One-way.
  - first is a matrix, the second is either a vector whose length is
  equal to the number of rows of the matrix, either a matrix of the
  same size => each colum is associated with one level of the 'column'
  factor, traditionnaly called 'B'. Rows are aggregated according to
  levels of the 'row' factor, called 'A'. Two-way ANOVA.

For one-way, computes the Within-group, Bewteen-group and Total
sum-of-squares, with are further scaled to compute corresponding
variances and F test. For two-way, the within-group is split as row
factor, colum factor and interaction. This produces three variances
and F test.

The target is different from MANOVA, which rests upon the hypothesis
that data in one group are samples of a normal distribution with
VECTOR mean and a square variance-covariance matrix.

The working was verified against data published on
1)http://www.itl.nist.gov/div898/handbook/prc/section4/prc438.htm
2)http://faculty.vassar.edu/lowry/ch16pt2.html
Note that in the second exemple given in
http://faculty.vassar.edu/lowry/ch16pt3.html¸there is a small mistake
in the computations, the SST is slightly incorrect.

Would it be possible to test the enclosed file, and replace    
/usr/share/octave/3.0.0/m/statistics/tests/anova.m 
with it ? As the old behaviour, in the case of one-way, is unmodified,
I guess there shouldn't be a name change. If its OK on Debian, it
should latter go upstream.

Regards

Pascal 

-- 
-------------- next part --------------
## Copyright (C) 1995, 1996, 1997, 1998, 1999, 2000, 2002, 2005, 2006,
##               2007 Kurt Hornik. 2008 Pascal Dupuis
##
## This file is part of Octave.
##
## Octave is free software; you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 3 of the License, or (at
## your option) any later version.
##
## Octave is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
## General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with Octave; see the file COPYING.  If not, see
## <http://www.gnu.org/licenses/>.

## -*- texinfo -*-
## @deftypefn {Function File} {[@var{pval}, @var{f}, @var{df_b}, @var{df_w}] =} anova (@var{y}, @var{g})

## Perform a one-way or two-way analysis of variance (ANOVA).  The goal
## is to test whether the population means of data taken from @var{k}
## different groups are all equal.
##
## Data may be given in a single vector @var{y} with groups specified by
## a corresponding vector of group labels @var{g} (e.g., numbers from 1
## to @var{k}). This is the general form which does not impose any
## restriction on the number of data in each group or the group labels.
##
## If @var{y} is a matrix and @var{g} is omitted, each column of @var{y}
## is treated as a group.  This form is only appropriate for balanced
## ANOVA in which the numbers of samples from each group are all equal.
##
## Another option is to have @var{y} as a matrix, and @var{g} as a
## vector or matrix of group labels : each column belongs to one level
## of the column factor, and group of rows belongs to one level of the
## row factor. In this case, the "Between Groups" variance is split into
## three components: one for the row factor, one for the column factor
## s, and one for the interactions between the two factors.
##
## Under the null of constant means, the statistic @var{f} follows an F
## distribution with @var{df_b} and @var{df_w} degrees of freedom.
##
## The p-value (1 minus the CDF of this distribution at @var{f}) is
## returned in @var{pval}. For two-way ANOVA, the p-value is a three
## elements vector, with row, column and interaction components.
##
## If no output argument is given, the standard ANOVA table is printed.
## @end deftypefn

## Author: KH <Kurt.Hornik at wu-wien.ac.at>
## Modified: Pascal Dupuis <Pascal.Dupuis at worldonline.be> for two ways ANOVA
## Description: One-way and Tow-Way analysis of variance (ANOVA)

function [pval, f, df_b, df_w] = anova (y, g)

  if ((nargin < 1) || (nargin > 2))
    print_usage ();
  elseif (nargin == 1)
    if (isvector (y))
      error ("anova: for `anova (y)', y must not be a vector");
    endif
    [group_count, k] = size (y);
    n = group_count * k;
    group_mean = mean (y);
    anova_type = 1;
  else
    if (! isvector (y))
      anova_type = 2;
      c = columns(y);
    else
      anova_type = 1;
    endif
    n = numel(y); m = numel(g); h = isvector(g);
    if (m != n) && (rows(g(:)) != rows(y)),
      error ("anova: g must be at least a vector with length equal to rows(y)");
    endif
    s = sort (g(:));
    i = find (s (2 : m) > s(1 : (m-1)));
    k = length (i) + 1;
    if (k == 1)
      error ("anova: there should be at least 2 groups");
    else
      group_label = s ([1, (reshape (i, 1, k-1) + 1)]);
    endif
    if 1 == anova_type,
      for i = k:-1:1,
	v = y (find (g == group_label (i)));
	group_count (i) = length (v);
	group_mean (i) = mean (v);
      endfor
    else
      for i = k:-1:1,
	if h,
	  v = y (find (g == group_label (i)), :);
	else
	  v = y (find (g == group_label (i)));
	endif
	v = reshape(v, numel(v)/c, c);
	group_count(i, 1:c) = rows (v);
	group_mean (i, :) = mean (v);
	row_mean (i, 1) = mean (v(:));
      endfor
      column_mean = mean(y); 
      column_count = sum(group_count, 1); 
      row_count = sum(group_count, 2);
    endif
  endif

  total_mean = mean (y(:));
  SST = sumsq (y(:) - total_mean);
  SSB = sum(sum (group_count .* (group_mean - total_mean) .^ 2));
  SSW = SST - SSB;
  
  if 1 == anova_type,
    df_b = k - 1;
    df_w = n - k;
    v_b = SSB / df_b;
    v_w = SSW / df_w;
    f = v_b / v_w;
    pval = 1 - f_cdf (f, df_b, df_w);
    
    if (nargout == 0)
      ## This eventually needs to be done more cleanly ...
      printf ("\n");
      printf ("One-way ANOVA Table:\n");
      printf ("\n");
      printf ("Source of Variation   Sum of Squares    df  Empirical Var\n");
      printf ("*********************************************************\n");
      printf ("Between Groups       %15.4f  %4d  %13.4f\n", SSB, df_b, v_b);
      printf ("Within Groups        %15.4f  %4d  %13.4f\n", SSW, df_w, v_w);
      printf ("---------------------------------------------------------\n");
      printf ("Total                %15.4f  %4d\n", SST, n - 1);
      printf ("\n");
      printf ("Test Statistic f     %15.4f\n", f);
      printf ("p-value              %15.4f\n", pval);
      printf ("\n");
    endif

  else
    
    SSR = sum (row_count .* (row_mean - total_mean) .^ 2);
    SSC = sum (column_count .* (column_mean - total_mean) .^ 2);
    SSRC= SSB - SSR - SSC;
    
    df_b = k*c - 1;
    df_w = n - k*c;
    df_r = k - 1;
    df_c = c - 1;
    df_rc = (k - 1) * (c - 1);
    v_w = SSW / df_w;
    v_b = SSB / df_b;
    v_r = SSR / df_r;
    v_c = SSC / df_c;
    v_rc= SSRC/ df_rc;
    f = [v_r / v_w; v_c / v_w; v_rc / v_w]; 
    pval = 1 - f_cdf (f, [df_r; df_c; df_rc], df_w);
  
    if (nargout == 0)
      ## This eventually needs to be done more cleanly ...
      printf ("\n");
      printf ("Two-way ANOVA Table:\n");
      printf ("\n");
      printf ("Source of Variation   Sum of Squares    df  Empirical Var\n");
      printf ("*********************************************************\n");
      printf ("Between Groups       %15.4f  %4d  %13.4f\n", SSB, df_b, v_b);
      printf ("    Rows             %15.4f  %4d  %13.4f\n", SSR, df_r, v_r);
      printf ("    Columns          %15.4f  %4d  %13.4f\n", SSC, df_c, v_c);
      printf ("    Interactions     %15.4f  %4d  %13.4f\n", SSRC,df_rc,v_rc);
      printf ("Within Groups        %15.4f  %4d  %13.4f\n", SSW, df_w, v_w);
      printf ("---------------------------------------------------------\n");
      printf ("Total                %15.4f  %4d\n", SST, n - 1);
      printf ("\n");
      printf ("Test Statistic f (rows)         %15.4f\n", f(1));
      printf ("p-value                         %15.4f\n", pval(1));
      printf ("Test Statistic f (columns)      %15.4f\n", f(2));
      printf ("p-value                         %15.4f\n", pval(2));
      printf ("Test Statistic f (interactions) %15.4f\n", f(3));
      printf ("p-value                         %15.4f\n", pval(3));
      printf ("\n");
    else
      df_b = [df_r; df_c; df_rc]
    endif
    
  endif
endfunction


More information about the Pkg-octave-devel mailing list