[Pkg-exppsy-maintainers] States Rock! and Question

Per B. Sederberg persed at princeton.edu
Thu Dec 6 18:56:15 UTC 2007


On Dec 6, 2007 12:02 PM, Yaroslav Halchenko <debian at onerussian.com> wrote:

> >    I started to stick the mahalanobisDistance function in with the
> >    metrics, but then realized that this is not where mahalanobis would
> >    ever be used.
> who knows... who knows... at least some more generic
> mahalanobisDistance can easily be used with searchlight at the moment if
> its aim is to define the neighbors for the searchlight, but as you
> pointed out it is not what you want.
>
> BTW - I've merged your branch and did some silly changes so your code
> became a bit more pylint friendly, which me and Michael agreed to
> use to enforce more or less uniform formatting of the code. We are
> supposed to run pylint for everything we do... but it had been done just
> sporadically thus we are having some "Make pylint happier" commits from
> time to time ;-)
>

Sounds good, I'll try and be more pylint-friendly :)


> Also there is an agreement to use epydoc with restructuredtext for API
> documentation, thus we are trying to describe parameters to the
> functions the way I've done for the mahalanobisDistance. You can
> generate all documentation simply by
> make doc
> or just
> make apidoc
> if you like only to get epydoc generated pages
>

Yup, I'll add in the proper docs.



> >    neighborhood information for a voxel, it's actually more like a
> >    classifier.
> I see ;-)
>
> >    The other difference is that there are major
> >    optimizations I've implemented for calculating the pairwise distances
> >    on a whole set of vectors, not just two at a time.
> I wonder if we may be should make all distance functions be able to
> operate on lists of points instead of just a pair of points... although
> as of now it might lead to a bit of code duplication inside of them
>

Yes, you can do loads at once much faster than looping.


> >    And I have no secrets for how I want to use it.  I've been thinking
> >    about a supervised mahalanobis distance kernel for classification
> >    along with Francois Meyer.  The basic idea is that you would take
> into
> >    account the underlying distributions of the labeled samples when
> >    calculating the kernel distances at training and when determining the
> >    distances for the test points.
> am i reading it right: calculating the kernel distances at training and
> using that distance metric later on when determining the distances for
> the test points. Right? not that you would use testing points as well to
> determine underlying distributions (which would bias generalization
> estimate since classifier would see testing data)
>
> if you provide training in x and testing in y, covariance gets computed
> using all of them... imho it is not acceptible to get unbiased
> generalization estimate
>

Oh, no peeking here!  You only use the labeled points to calculate the
covariance matrix.  In fact, you calculate a separate covariance matrix for
each unique label!  Then you use the proper covariance matrix when doing a
comparison of an unknown sample to a known sample.



>
> > Then you can use these, supposedly
> >    better, kernels in place of any other kernel in any of the kernel
> >    methods such as SVM, kerneled ridger regression, ...
> I wonder if smth like that wasn't tried yet by anyone... seems like an
> obvious thing to give a try ;-) although everything genius supposed to
> be simple ;-)
>

Folks have used mahalanobis distance for KNN before and sometimes with some
good results.  It all depends on whether you actually have a skewed
covariance matrix.


>
> >    Given that you
> >    need to have more samples than features for mahalanobis to make much
> >    sense, I would like to run this within a searchlight.
> please correct me if I am wrong -- so searchlight would actually operate
> within cartesian coord system to select the neighbors, but then within
> those neighbors set you compute corresponding covariance (or smth else)
> which provides the matrix for mahalanobis, which you would use inside
> the classifier only (not to select voxels within a searchlight)
>

Yes, for example, I want to know the distance between two samples within
that searchlight.


>
> >    So, given all that, do you still think I should drop the
> >    mahalanobisDistance function into the metric.py code?  I'll stick it
> >    in there for now so that you can see it.
> it would be great if you provided a testcase for it. I think it can be
> considerably reduced in size thus increasing readability but I am afraid
> to break it. Those silly unittests help a bit ;-)
> I think its place is in metric but as I mentioned it might be better to
> reshape it: we can have a functor, which is parametrized with x,y,w so
> that covariance is computed while initializing it. Then, in __call__ it
> can take x,y=None and spit out matrix of distances or a scalar if there
> is only 2 points (ie  x and y). This way it would satisfy interface of
> the other distance functions in there and would allow you to use it as
> you intended. or am I wrong?


I think that would work fine.

Thanks for all the discussions on this :)  I'm off to go collect some more
fMRI datums...

P


>
> --
> Yaroslav Halchenko
> Research Assistant, Psychology Department, Rutgers-Newark
> Student  Ph.D. @ CS Dept. NJIT
> Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
>        101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> WWW:     http://www.linkedin.com/in/yarik
>
> _______________________________________________
> Pkg-exppsy-maintainers mailing list
> Pkg-exppsy-maintainers at lists.alioth.debian.org
> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-maintainers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.alioth.debian.org/pipermail/pkg-exppsy-maintainers/attachments/20071206/678105a2/attachment.htm 


More information about the Pkg-exppsy-maintainers mailing list