[Pkg-exppsy-maintainers] States Rock! and Question
Per B. Sederberg
persed at princeton.edu
Thu Dec 6 18:56:15 UTC 2007
On Dec 6, 2007 12:02 PM, Yaroslav Halchenko <debian at onerussian.com> wrote:
> > I started to stick the mahalanobisDistance function in with the
> > metrics, but then realized that this is not where mahalanobis would
> > ever be used.
> who knows... who knows... at least some more generic
> mahalanobisDistance can easily be used with searchlight at the moment if
> its aim is to define the neighbors for the searchlight, but as you
> pointed out it is not what you want.
>
> BTW - I've merged your branch and did some silly changes so your code
> became a bit more pylint friendly, which me and Michael agreed to
> use to enforce more or less uniform formatting of the code. We are
> supposed to run pylint for everything we do... but it had been done just
> sporadically thus we are having some "Make pylint happier" commits from
> time to time ;-)
>
Sounds good, I'll try and be more pylint-friendly :)
> Also there is an agreement to use epydoc with restructuredtext for API
> documentation, thus we are trying to describe parameters to the
> functions the way I've done for the mahalanobisDistance. You can
> generate all documentation simply by
> make doc
> or just
> make apidoc
> if you like only to get epydoc generated pages
>
Yup, I'll add in the proper docs.
> > neighborhood information for a voxel, it's actually more like a
> > classifier.
> I see ;-)
>
> > The other difference is that there are major
> > optimizations I've implemented for calculating the pairwise distances
> > on a whole set of vectors, not just two at a time.
> I wonder if we may be should make all distance functions be able to
> operate on lists of points instead of just a pair of points... although
> as of now it might lead to a bit of code duplication inside of them
>
Yes, you can do loads at once much faster than looping.
> > And I have no secrets for how I want to use it. I've been thinking
> > about a supervised mahalanobis distance kernel for classification
> > along with Francois Meyer. The basic idea is that you would take
> into
> > account the underlying distributions of the labeled samples when
> > calculating the kernel distances at training and when determining the
> > distances for the test points.
> am i reading it right: calculating the kernel distances at training and
> using that distance metric later on when determining the distances for
> the test points. Right? not that you would use testing points as well to
> determine underlying distributions (which would bias generalization
> estimate since classifier would see testing data)
>
> if you provide training in x and testing in y, covariance gets computed
> using all of them... imho it is not acceptible to get unbiased
> generalization estimate
>
Oh, no peeking here! You only use the labeled points to calculate the
covariance matrix. In fact, you calculate a separate covariance matrix for
each unique label! Then you use the proper covariance matrix when doing a
comparison of an unknown sample to a known sample.
>
> > Then you can use these, supposedly
> > better, kernels in place of any other kernel in any of the kernel
> > methods such as SVM, kerneled ridger regression, ...
> I wonder if smth like that wasn't tried yet by anyone... seems like an
> obvious thing to give a try ;-) although everything genius supposed to
> be simple ;-)
>
Folks have used mahalanobis distance for KNN before and sometimes with some
good results. It all depends on whether you actually have a skewed
covariance matrix.
>
> > Given that you
> > need to have more samples than features for mahalanobis to make much
> > sense, I would like to run this within a searchlight.
> please correct me if I am wrong -- so searchlight would actually operate
> within cartesian coord system to select the neighbors, but then within
> those neighbors set you compute corresponding covariance (or smth else)
> which provides the matrix for mahalanobis, which you would use inside
> the classifier only (not to select voxels within a searchlight)
>
Yes, for example, I want to know the distance between two samples within
that searchlight.
>
> > So, given all that, do you still think I should drop the
> > mahalanobisDistance function into the metric.py code? I'll stick it
> > in there for now so that you can see it.
> it would be great if you provided a testcase for it. I think it can be
> considerably reduced in size thus increasing readability but I am afraid
> to break it. Those silly unittests help a bit ;-)
> I think its place is in metric but as I mentioned it might be better to
> reshape it: we can have a functor, which is parametrized with x,y,w so
> that covariance is computed while initializing it. Then, in __call__ it
> can take x,y=None and spit out matrix of distances or a scalar if there
> is only 2 points (ie x and y). This way it would satisfy interface of
> the other distance functions in there and would allow you to use it as
> you intended. or am I wrong?
I think that would work fine.
Thanks for all the discussions on this :) I'm off to go collect some more
fMRI datums...
P
>
> --
> Yaroslav Halchenko
> Research Assistant, Psychology Department, Rutgers-Newark
> Student Ph.D. @ CS Dept. NJIT
> Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
> 101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> WWW: http://www.linkedin.com/in/yarik
>
> _______________________________________________
> Pkg-exppsy-maintainers mailing list
> Pkg-exppsy-maintainers at lists.alioth.debian.org
> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-maintainers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.alioth.debian.org/pipermail/pkg-exppsy-maintainers/attachments/20071206/678105a2/attachment.htm
More information about the Pkg-exppsy-maintainers
mailing list