[Pkg-exppsy-maintainers] First round of SMLR working!

Mon Mar 3 19:35:58 UTC 2008

ok - I will spit out split/selection from dataset causes such
behavior 

this data is an artif dataset with few univariate 'relevant' features
and the rest is noise... It has 512 features total... 

On Mon, 03 Mar 2008, Per B. Sederberg wrote:

> Hi Yarik:

> Can you send me a dataset that causes it to fail (i.e., causes the
> values to grow really big in the C version)?  And why do you only have
> 512 features to start?  Have you already performed feature selection?

> Thanks,
> Per

> On Mon, Mar 3, 2008 at 11:56 AM, Yaroslav Halchenko
> <debian at onerussian.com> wrote:
> > I am about to push some minor changes into SMLR code (mostly producing
> >  debug output I think ;-))

> >  as for overflow -- just checkout log file
> >  http://www.onerussian.com/Linux/bugs/smlr/overflow3.gz

> >  you will see that some runs simply diverge, some do stop at some huge
> >  values of weights (which leads to overflow in predict). I am yet to spit
> >  out a problematic train/test case of interest.

> >  BTw - it seems I didn't mention such a problem while running python
> >  implementation of SMLR...

> >  so, 1 of unittests to go into test_smlr is actually to validate that
> >  results of runs of C and python implementations are close to each
> >  other...

> >  Please let me know if anyone is working on SMLR code as  of speaking so
> >  we don't cross the spairs while merging ;-)

> >  On Mon, 03 Mar 2008, Per B. Sederberg wrote:

> >  > On Mon, Mar 3, 2008 at 12:55 AM, Yaroslav Halchenko
> >  > <yoh at psychology.rutgers.edu> wrote:
> >  > > > I'm guessing it may be because you have no features left.  Can you
> >  > >  > narrow it down to a testcase that we can reproduce?
> >  > >  will try although

> >  > Even if you just save out the weights from a run that gives you a
> >  > warning, that would help.  You could also save the training and
> >  > testing data that went into that classifier.

> >  > >  > >  [CLF] DBG{ 'Sun Mar  2 23:47:45 2008' '2.659 sec' 'VmSize:\t  147072 kB'}:      Predicting classifier SMLR(lm=0.100000, convergence_tol=0.001, maxiter=10000, implementation='C', enabled_states=['predictions', 'trained_labels']) on data (18, 442)
> >  > >  says that there is 442 features. My guess is that actually not that I am
> >  > >  left without features but that there is excess in their inner-product
> >  > >  with weights w, thus it overflows in exp...

> >  > The fact that it says 442 features means that was what was going into
> >  > training, right?  Not that after training there were 442 non-zero
> >  > weights.

> >  > Are you using smlr for feature selection and then using it for
> >  > classification as a separate step?  If so, it may be better to use
> >  > ridge or standard logistic regression with a gaussian prior for the
> >  > classification because it has a better-suited regularization for
> >  > classifying when you know that the features are good.

> >  > Let me know and more details and I'll try and get to the bottom of this.

> >  > Thanks,
> >  > Per

> >  > > > >  Warning: overflow encountered in exp
> >  > >  > >  Warning: invalid value encountered in divide

> >  > >  > >  On Sun, 02 Mar 2008, Per B. Sederberg wrote:

> >  > >  > >  > I didn't see any changes to test_smlr, did you push.

> >  > >  > >  > Another thing to keep in mind as we go forward with SMLR, is that it
> >  > >  > >  > is a multi-class classifier (though I have not tested it.)  When you
> >  > >  > >  > have multiple classes, the weight matrix will be M-1 by the number of
> >  > >  > >  > features where M is the number of classes.

> >  > >  > >  > I think we'll want to do something like sum over the M-1 classes to
> >  > >  > >  > get the sensitivity for each feature.

> >  > >  > >  > Best,
> >  > >  > >  > Per

> >  > >  > >  > On Sun, Mar 2, 2008 at 6:50 PM, Yaroslav Halchenko
> >  > >  > >  > <yoh at psychology.rutgers.edu> wrote:
> >  > >  > >  > > hey -- I wrote some notes in the test_smlr ... you might find them
> >  > >  > >  > >  useful ;-)

> >  > >  > >  > >  On Sun, 02 Mar 2008, Per B. Sederberg wrote:

> >  > >  > >  > > > I have contacted the first author of the SMLR manuscript, and he sent
> >  > >  > >  > >  > me some "rough" matlab code that he said has no copyrights and that I
> >  > >  > >  > >  > could use any way that I wanted with no reference to him.  He was
> >  > >  > >  > >  > overjoyed that we were getting use out of it and that we could port it
> >  > >  > >  > >  > to include in any closed or open-source project we wanted.

> >  > >  > >  > >  > So, I used his code as inspiration for the python code, which I then
> >  > >  > >  > >  > ported to C.

> >  > >  > >  > >  > He was so awesome about it that we should be sure to cite him whenever
> >  > >  > >  > >  > we use it.

> >  > >  > >  > >  > So, we are in the clear :)

> >  > >  > >  > >  > P

> >  > >  > >  > >  > On Sun, Mar 2, 2008 at 5:45 PM, Yaroslav Halchenko
> >  > >  > >  > >  > <debian at onerussian.com> wrote:
> >  > >  > >  > >  > > btw -- just to be sure

> >  > >  > >  > >  > >  authors of SMLR original software release it under non-commercial
> >  > >  > >  > >  > >  license
> >  > >  > >  > >  > >  http://www.cs.duke.edu/~amink/software/smlr/
> >  > >  > >  > >  > >  Licensing Overview

> >  > >  > >  > >  > >  You may license SMLR either under a non-commercial use license or under
> >  > >  > >  > >  > >  ....

> >  > >  > >  > >  > >  but since Per coded it himself and there is no patent assigned to it
> >  > >  > >  > >  > >  (isn't there?) we are ok to release it within pymvpa, right?

> >  > >  > >  > >  > >  --

> >  > >  > >  > >  > > Yaroslav Halchenko
> >  > >  > >  > >  > >  Research Assistant, Psychology Department, Rutgers-Newark
> >  > >  > >  > >  > >  Student  Ph.D. @ CS Dept. NJIT
> >  > >  > >  > >  > >  Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
> >  > >  > >  > >  > >         101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> >  > >  > >  > >  > >  WWW:     http://www.linkedin.com/in/yarik

> >  > >  > >  > >  > > _______________________________________________
> >  > >  > >  > >  > >  Pkg-exppsy-maintainers mailing list
> >  > >  > >  > >  > >  Pkg-exppsy-maintainers at lists.alioth.debian.org
> >  > >  > >  > >  > >  http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-maintainers

> >  > >  > >  > >  --

> >  > >  > >  > > Yaroslav Halchenko
> >  > >  > >  > >  Research Assistant, Psychology Department, Rutgers-Newark
> >  > >  > >  > >  Student  Ph.D. @ CS Dept. NJIT
> >  > >  > >  > >  Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
> >  > >  > >  > >         101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> >  > >  > >  > >  WWW:     http://www.linkedin.com/in/yarik

> >  > >  > >  --

> >  > >  > > Yaroslav Halchenko
> >  > >  > >  Research Assistant, Psychology Department, Rutgers-Newark
> >  > >  > >  Student  Ph.D. @ CS Dept. NJIT
> >  > >  > >  Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
> >  > >  > >         101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> >  > >  > >  WWW:     http://www.linkedin.com/in/yarik

> >  > >  --

> >  > > Yaroslav Halchenko
> >  > >  Research Assistant, Psychology Department, Rutgers-Newark
> >  > >  Student  Ph.D. @ CS Dept. NJIT
> >  > >  Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
> >  > >         101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> >  > >  WWW:     http://www.linkedin.com/in/yarik

> >  --

> > Yaroslav Halchenko
> >  Research Assistant, Psychology Department, Rutgers-Newark
> >  Student  Ph.D. @ CS Dept. NJIT
> >  Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
> >         101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> >  WWW:     http://www.linkedin.com/in/yarik

-- 
Yaroslav Halchenko
Research Assistant, Psychology Department, Rutgers-Newark
Student  Ph.D. @ CS Dept. NJIT
Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
        101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
WWW:     http://www.linkedin.com/in/yarik