[Deb-scipy-devel] RE: Bug#322454: python-scipy: several scipy.stats failures

Perry, Alexander (GE Infrastructure) Alex.Perry at ge.com
Sat Aug 13 02:42:20 UTC 2005


Oh, I didn't know there was a Debian SciPy list!

From: Alexandre Fayolle [mailto:alexandre.fayolle at logilab.fr]
On Wed, Aug 10, 2005 at 02:39:54PM -0400, Perry, Alexander (GE
Infrastructure) wrote:
> > There are typographic bugs in stats.py that make two functions fail.
In
> > addition, the calculation for nanstd() provides obviously wrong
answers.
> > I have not done any formal checking for the correction attached
below.

> I'm having a few problems understanding your patch for nanstd, which
> does not seem to give the right results either, but I'm not sure of
the
> expected semantics of the nanstd function. Is it expected that
> nanstd([nan, 2., 4.]) == std([2., 4.]) 
> or that 
> nanstd([nan, 2., 4.]) == std([0., 2., 4.]) 
> your code produces neither. I'd go for the first option (given the
> definition of nanmean)

I don't know; as far as I can tell, it isn't documented what upstream
intends the function to return.  Feel free to pick which you think is
best and we can see what upstream does with my bug report.  Bear in mind
there is a third result option, which is what I attempted to achieve.  I
don't know whether it is the correct one, of course.  The third option
is that the nan-removed dataset is a sample of the larger dataset, so
that the standard deviation of the larger dataset has to be increased
beyond the the standard deviation of the nan-removed dataset to account
for the increase in uncertainty.

> I propose the following implementation of nanstd:

Fine by me.

> > In the same file, nnlf() method passes its own object twice to
_nnlf()

> I don't see this in my code.

Line 746 of
/usr/lib/python2.3/site-packages/scipy/stats/distributions.py
> > ! return self._nnlf(self, x, *args) + N*log(scale)

The "_nnlf" is being called as a method of "self", so automatically
gets the identity of itself being passed to the call as a consequence.
Placing "self" as the first parameter means it gets passed a second
time.
This is sometimes useful, but not in this case; look at line 724.

> The diff you sent me are strange, some of
> the chunks seem to be reversed. 

Yeah, I notice now that part of my diffs ended up reversed.  Sorry.

> Do we agree that the bottom version is correct ? I.e.:
> return self._nnlf(x, *args) + N*log(scale) 

Yes.

> This is the code I had in my source tree. 

Really?  How odd.  My lines were quoted from a computer running
Testing, the file is in binary package "python2.3-scipy" for i386
and the unmodified installed version is 0.3.2-6

> This is the final patch that I plan to upload.

Some of that looks reversed; for example:
> -        mu, mu2 = self.stats(*args,**{'moments':'mv'})
> -        muhat = st.nanmean(data)
> -        mu2hat = st.nanstd(data)
> +        mu, mu2, g1, g2 = self.stats(*args,**{'moments':'mv'})
> +        muhat = stats.nanmean(data)
> +        mu2hat = stats.nanstd(data)
Maybe you already have some of my fixes in your local tree?




More information about the Deb-scipy-devel mailing list