[Pkg-exppsy-maintainers] States Rock! and Question

Yaroslav Halchenko debian at onerussian.com
Thu Dec 6 17:02:50 UTC 2007


>    I started to stick the mahalanobisDistance function in with the
>    metrics, but then realized that this is not where mahalanobis would
>    ever be used.
who knows... who knows... at least some more generic
mahalanobisDistance can easily be used with searchlight at the moment if
its aim is to define the neighbors for the searchlight, but as you
pointed out it is not what you want.

BTW - I've merged your branch and did some silly changes so your code
became a bit more pylint friendly, which me and Michael agreed to
use to enforce more or less uniform formatting of the code. We are
supposed to run pylint for everything we do... but it had been done just
sporadically thus we are having some "Make pylint happier" commits from
time to time ;-)

Also there is an agreement to use epydoc with restructuredtext for API
documentation, thus we are trying to describe parameters to the
functions the way I've done for the mahalanobisDistance. You can
generate all documentation simply by
make doc
or just
make apidoc 
if you like only to get epydoc generated pages

>    neighborhood information for a voxel, it's actually more like a
>    classifier.
I see ;-)

>    The other difference is that there are major
>    optimizations I've implemented for calculating the pairwise distances
>    on a whole set of vectors, not just two at a time.
I wonder if we may be should make all distance functions be able to
operate on lists of points instead of just a pair of points... although
as of now it might lead to a bit of code duplication inside of them

>    And I have no secrets for how I want to use it.  I've been thinking
>    about a supervised mahalanobis distance kernel for classification
>    along with Francois Meyer.  The basic idea is that you would take into
>    account the underlying distributions of the labeled samples when
>    calculating the kernel distances at training and when determining the
>    distances for the test points.
am i reading it right: calculating the kernel distances at training and
using that distance metric later on when determining the distances for
the test points. Right? not that you would use testing points as well to
determine underlying distributions (which would bias generalization
estimate since classifier would see testing data)

if you provide training in x and testing in y, covariance gets computed
using all of them... imho it is not acceptible to get unbiased
generalization estimate

> Then you can use these, supposedly
>    better, kernels in place of any other kernel in any of the kernel
>    methods such as SVM, kerneled ridger regression, ...
I wonder if smth like that wasn't tried yet by anyone... seems like an
obvious thing to give a try ;-) although everything genius supposed to
be simple ;-)

>    Given that you
>    need to have more samples than features for mahalanobis to make much
>    sense, I would like to run this within a searchlight.
please correct me if I am wrong -- so searchlight would actually operate
within cartesian coord system to select the neighbors, but then within
those neighbors set you compute corresponding covariance (or smth else)
which provides the matrix for mahalanobis, which you would use inside
the classifier only (not to select voxels within a searchlight)

>    So, given all that, do you still think I should drop the
>    mahalanobisDistance function into the metric.py code?  I'll stick it
>    in there for now so that you can see it.
it would be great if you provided a testcase for it. I think it can be
considerably reduced in size thus increasing readability but I am afraid
to break it. Those silly unittests help a bit ;-)
I think its place is in metric but as I mentioned it might be better to
reshape it: we can have a functor, which is parametrized with x,y,w so
that covariance is computed while initializing it. Then, in __call__ it
can take x,y=None and spit out matrix of distances or a scalar if there
is only 2 points (ie  x and y). This way it would satisfy interface of
the other distance functions in there and would allow you to use it as
you intended. or am I wrong?
-- 
Yaroslav Halchenko
Research Assistant, Psychology Department, Rutgers-Newark
Student  Ph.D. @ CS Dept. NJIT
Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
        101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
WWW:     http://www.linkedin.com/in/yarik        



More information about the Pkg-exppsy-maintainers mailing list