[Pkg-exppsy-maintainers] First round of SMLR working!

Yaroslav Halchenko debian at onerussian.com
Mon Mar 3 20:34:55 UTC 2008


sorry -- I forgotten that it looses dimensionality. But it is in the log
lines:
> >  > >  [SMLR_] DBG{ 'Mon Mar  3 14:44:41 2008' '0.053 sec' 'VmSize:\t  145244 kB'}:       train finished in 631 cycles on data.shape=(90, 8) min:max(data)=-3.279930:3.305094, got min:max(w)=0.000000:0.000000

so it is 90x8 (90 samples, 8 features) for spilover case

and  (90, 329) for overflow

On Mon, 03 Mar 2008, Per B. Sederberg wrote:

> fromfile loses dimensionality.  What are the dimensions?

> Are you not checking IRC?

> P

> On Mon, Mar 3, 2008 at 3:01 PM, Yaroslav Halchenko
> <debian at onerussian.com> wrote:
> > oops... sorry -- stored numpy's arrays .tofile, so you need
> >  numpy.fromfile



> >  On Mon, 03 Mar 2008, Per B. Sederberg wrote:

> >  > What data format are those in (how do I read them in)?  And you say
> >  > they don't fail with the python version?

> >  > Thanks,
> >  > P

> >  > On Mon, Mar 3, 2008 at 2:54 PM, Yaroslav Halchenko
> >  > <debian at onerussian.com> wrote:
> >  > > ok -- 2 datasets... in 1 we get 'spilover' effect:
> >  > >  cycle=624 ; incr=0.00157436 ; non_zero=8 ; sum2_w_old=77745.9
> >  > >  cycle=625 ; incr=0.00157188 ; non_zero=8 ; sum2_w_old=77990.9
> >  > >  cycle=626 ; incr=0.0015694 ; non_zero=8 ; sum2_w_old=78236.2
> >  > >  cycle=627 ; incr=0.00156693 ; non_zero=8 ; sum2_w_old=78482
> >  > >  cycle=628 ; incr=0.00156447 ; non_zero=8 ; sum2_w_old=78728.1
> >  > >  cycle=629 ; incr=inf ; non_zero=7 ; sum2_w_old=78974.7
> >  > >  cycle=630 ; incr=nan ; non_zero=0 ; sum2_w_old=inf
> >  > >  cycle=631 ; incr=0 ; non_zero=0 ; sum2_w_old=0
> >  > >  [SMLR_] DBG{ 'Mon Mar  3 14:44:41 2008' '0.053 sec' 'VmSize:\t  145244 kB'}:       train finished in 631 cycles on data.shape=(90, 8) min:max(data)=-3.279930:3.305094, got min:max(w)=0.000000:0.000000

> >  > >  and in another overflow of exponent:
> >  > >  cycle=994 ; incr=0.00100595 ; non_zero=328 ; sum2_w_old=3.93982e+07
> >  > >  cycle=995 ; incr=0.00100494 ; non_zero=328 ; sum2_w_old=3.94775e+07
> >  > >  cycle=996 ; incr=0.00100393 ; non_zero=328 ; sum2_w_old=3.95569e+07
> >  > >  cycle=997 ; incr=0.00100292 ; non_zero=328 ; sum2_w_old=3.96364e+07
> >  > >  cycle=998 ; incr=0.00100192 ; non_zero=328 ; sum2_w_old=3.97159e+07
> >  > >  cycle=999 ; incr=0.00100092 ; non_zero=328 ; sum2_w_old=3.97956e+07
> >  > >  cycle=1000 ; incr=0.000999915 ; non_zero=328 ; sum2_w_old=3.98753e+07
> >  > >  [SMLR_] DBG{ 'Mon Mar  3 14:50:09 2008' '2.176 sec' 'VmSize:\t  145540 kB'}:       train finished in 1000 cycles on data.shape=(90, 329)
> >  > >  min:max(data)=-4.433080:5.054645, got min:max(w)=-962.589603:941.041508


> >  > >  Warning: overflow encountered in exp
> >  > >  both sets are available from
> >  > >  http://www.onerussian.com/Linux/bugs/smlr/ds/

> >  > >  tr for training, te for testing ;)




> >  > >  On Mon, 03 Mar 2008, Yaroslav Halchenko wrote:

> >  > >  > ok - I will spit out split/selection from dataset causes such
> >  > >  > behavior

> >  > >  > this data is an artif dataset with few univariate 'relevant' features
> >  > >  > and the rest is noise... It has 512 features total...

> >  > >  > On Mon, 03 Mar 2008, Per B. Sederberg wrote:

> >  > >  > > Hi Yarik:

> >  > >  > > Can you send me a dataset that causes it to fail (i.e., causes the
> >  > >  > > values to grow really big in the C version)?  And why do you only have
> >  > >  > > 512 features to start?  Have you already performed feature selection?

> >  > >  > > Thanks,
> >  > >  > > Per


> >  > >  > > On Mon, Mar 3, 2008 at 11:56 AM, Yaroslav Halchenko
> >  > >  > > <debian at onerussian.com> wrote:
> >  > >  > > > I am about to push some minor changes into SMLR code (mostly producing
> >  > >  > > >  debug output I think ;-))

> >  > >  > > >  as for overflow -- just checkout log file
> >  > >  > > >  http://www.onerussian.com/Linux/bugs/smlr/overflow3.gz

> >  > >  > > >  you will see that some runs simply diverge, some do stop at some huge
> >  > >  > > >  values of weights (which leads to overflow in predict). I am yet to spit
> >  > >  > > >  out a problematic train/test case of interest.

> >  > >  > > >  BTw - it seems I didn't mention such a problem while running python
> >  > >  > > >  implementation of SMLR...

> >  > >  > > >  so, 1 of unittests to go into test_smlr is actually to validate that
> >  > >  > > >  results of runs of C and python implementations are close to each
> >  > >  > > >  other...

> >  > >  > > >  Please let me know if anyone is working on SMLR code as  of speaking so
> >  > >  > > >  we don't cross the spairs while merging ;-)



> >  > >  > > >  On Mon, 03 Mar 2008, Per B. Sederberg wrote:

> >  > >  > > >  > On Mon, Mar 3, 2008 at 12:55 AM, Yaroslav Halchenko
> >  > >  > > >  > <yoh at psychology.rutgers.edu> wrote:
> >  > >  > > >  > > > I'm guessing it may be because you have no features left.  Can you
> >  > >  > > >  > >  > narrow it down to a testcase that we can reproduce?
> >  > >  > > >  > >  will try although


> >  > >  > > >  > Even if you just save out the weights from a run that gives you a
> >  > >  > > >  > warning, that would help.  You could also save the training and
> >  > >  > > >  > testing data that went into that classifier.


> >  > >  > > >  > >  > >  [CLF] DBG{ 'Sun Mar  2 23:47:45 2008' '2.659 sec' 'VmSize:\t  147072 kB'}:      Predicting classifier SMLR(lm=0.100000, convergence_tol=0.001, maxiter=10000, implementation='C', enabled_states=['predictions', 'trained_labels']) on data (18, 442)
> >  > >  > > >  > >  says that there is 442 features. My guess is that actually not that I am
> >  > >  > > >  > >  left without features but that there is excess in their inner-product
> >  > >  > > >  > >  with weights w, thus it overflows in exp...

> >  > >  > > >  > The fact that it says 442 features means that was what was going into
> >  > >  > > >  > training, right?  Not that after training there were 442 non-zero
> >  > >  > > >  > weights.

> >  > >  > > >  > Are you using smlr for feature selection and then using it for
> >  > >  > > >  > classification as a separate step?  If so, it may be better to use
> >  > >  > > >  > ridge or standard logistic regression with a gaussian prior for the
> >  > >  > > >  > classification because it has a better-suited regularization for
> >  > >  > > >  > classifying when you know that the features are good.

> >  > >  > > >  > Let me know and more details and I'll try and get to the bottom of this.

> >  > >  > > >  > Thanks,
> >  > >  > > >  > Per



> >  > >  > > >  > > > >  Warning: overflow encountered in exp
> >  > >  > > >  > >  > >  Warning: invalid value encountered in divide




> >  > >  > > >  > >  > >  On Sun, 02 Mar 2008, Per B. Sederberg wrote:

> >  > >  > > >  > >  > >  > I didn't see any changes to test_smlr, did you push.

> >  > >  > > >  > >  > >  > Another thing to keep in mind as we go forward with SMLR, is that it
> >  > >  > > >  > >  > >  > is a multi-class classifier (though I have not tested it.)  When you
> >  > >  > > >  > >  > >  > have multiple classes, the weight matrix will be M-1 by the number of
> >  > >  > > >  > >  > >  > features where M is the number of classes.

> >  > >  > > >  > >  > >  > I think we'll want to do something like sum over the M-1 classes to
> >  > >  > > >  > >  > >  > get the sensitivity for each feature.

> >  > >  > > >  > >  > >  > Best,
> >  > >  > > >  > >  > >  > Per


> >  > >  > > >  > >  > >  > On Sun, Mar 2, 2008 at 6:50 PM, Yaroslav Halchenko
> >  > >  > > >  > >  > >  > <yoh at psychology.rutgers.edu> wrote:
> >  > >  > > >  > >  > >  > > hey -- I wrote some notes in the test_smlr ... you might find them
> >  > >  > > >  > >  > >  > >  useful ;-)


> >  > >  > > >  > >  > >  > >  On Sun, 02 Mar 2008, Per B. Sederberg wrote:



> >  > >  > > >  > >  > >  > > > I have contacted the first author of the SMLR manuscript, and he sent
> >  > >  > > >  > >  > >  > >  > me some "rough" matlab code that he said has no copyrights and that I
> >  > >  > > >  > >  > >  > >  > could use any way that I wanted with no reference to him.  He was
> >  > >  > > >  > >  > >  > >  > overjoyed that we were getting use out of it and that we could port it
> >  > >  > > >  > >  > >  > >  > to include in any closed or open-source project we wanted.

> >  > >  > > >  > >  > >  > >  > So, I used his code as inspiration for the python code, which I then
> >  > >  > > >  > >  > >  > >  > ported to C.

> >  > >  > > >  > >  > >  > >  > He was so awesome about it that we should be sure to cite him whenever
> >  > >  > > >  > >  > >  > >  > we use it.

> >  > >  > > >  > >  > >  > >  > So, we are in the clear :)

> >  > >  > > >  > >  > >  > >  > P

> >  > >  > > >  > >  > >  > >  > On Sun, Mar 2, 2008 at 5:45 PM, Yaroslav Halchenko
> >  > >  > > >  > >  > >  > >  > <debian at onerussian.com> wrote:
> >  > >  > > >  > >  > >  > >  > > btw -- just to be sure

> >  > >  > > >  > >  > >  > >  > >  authors of SMLR original software release it under non-commercial
> >  > >  > > >  > >  > >  > >  > >  license
> >  > >  > > >  > >  > >  > >  > >  http://www.cs.duke.edu/~amink/software/smlr/
> >  > >  > > >  > >  > >  > >  > >  Licensing Overview

> >  > >  > > >  > >  > >  > >  > >  You may license SMLR either under a non-commercial use license or under
> >  > >  > > >  > >  > >  > >  > >  ....

> >  > >  > > >  > >  > >  > >  > >  but since Per coded it himself and there is no patent assigned to it
> >  > >  > > >  > >  > >  > >  > >  (isn't there?) we are ok to release it within pymvpa, right?

> >  > >  > > >  > >  > >  > >  > >  --

> >  > >  > > >  > >  > >  > >  > > Yaroslav Halchenko
> >  > >  > > >  > >  > >  > >  > >  Research Assistant, Psychology Department, Rutgers-Newark
> >  > >  > > >  > >  > >  > >  > >  Student  Ph.D. @ CS Dept. NJIT
> >  > >  > > >  > >  > >  > >  > >  Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
> >  > >  > > >  > >  > >  > >  > >         101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> >  > >  > > >  > >  > >  > >  > >  WWW:     http://www.linkedin.com/in/yarik



> >  > >  > > >  > >  > >  > >  > > _______________________________________________
> >  > >  > > >  > >  > >  > >  > >  Pkg-exppsy-maintainers mailing list
> >  > >  > > >  > >  > >  > >  > >  Pkg-exppsy-maintainers at lists.alioth.debian.org
> >  > >  > > >  > >  > >  > >  > >  http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-maintainers



> >  > >  > > >  > >  > >  > >  --


> >  > >  > > >  > >  > >  > > Yaroslav Halchenko
> >  > >  > > >  > >  > >  > >  Research Assistant, Psychology Department, Rutgers-Newark
> >  > >  > > >  > >  > >  > >  Student  Ph.D. @ CS Dept. NJIT
> >  > >  > > >  > >  > >  > >  Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
> >  > >  > > >  > >  > >  > >         101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> >  > >  > > >  > >  > >  > >  WWW:     http://www.linkedin.com/in/yarik



> >  > >  > > >  > >  > >  --


> >  > >  > > >  > >  > > Yaroslav Halchenko
> >  > >  > > >  > >  > >  Research Assistant, Psychology Department, Rutgers-Newark
> >  > >  > > >  > >  > >  Student  Ph.D. @ CS Dept. NJIT
> >  > >  > > >  > >  > >  Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
> >  > >  > > >  > >  > >         101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> >  > >  > > >  > >  > >  WWW:     http://www.linkedin.com/in/yarik



> >  > >  > > >  > >  --


> >  > >  > > >  > > Yaroslav Halchenko
> >  > >  > > >  > >  Research Assistant, Psychology Department, Rutgers-Newark
> >  > >  > > >  > >  Student  Ph.D. @ CS Dept. NJIT
> >  > >  > > >  > >  Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
> >  > >  > > >  > >         101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> >  > >  > > >  > >  WWW:     http://www.linkedin.com/in/yarik



> >  > >  > > >  --


> >  > >  > > > Yaroslav Halchenko
> >  > >  > > >  Research Assistant, Psychology Department, Rutgers-Newark
> >  > >  > > >  Student  Ph.D. @ CS Dept. NJIT
> >  > >  > > >  Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
> >  > >  > > >         101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> >  > >  > > >  WWW:     http://www.linkedin.com/in/yarik
> >  > >  --
> >  > >  Yaroslav Halchenko
> >  > >  Research Assistant, Psychology Department, Rutgers-Newark
> >  > >  Student  Ph.D. @ CS Dept. NJIT
> >  > >  Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
> >  > >         101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> >  > >  WWW:     http://www.linkedin.com/in/yarik

> >  > >  _______________________________________________
> >  > >  Pkg-exppsy-maintainers mailing list
> >  > >  Pkg-exppsy-maintainers at lists.alioth.debian.org
> >  > >  http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-maintainers



> >  --


> > Yaroslav Halchenko
> >  Research Assistant, Psychology Department, Rutgers-Newark
> >  Student  Ph.D. @ CS Dept. NJIT
> >  Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
> >         101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> >  WWW:     http://www.linkedin.com/in/yarik

> >  _______________________________________________
> >  Pkg-exppsy-maintainers mailing list
> >  Pkg-exppsy-maintainers at lists.alioth.debian.org
> >  http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-maintainers



-- 
Yaroslav Halchenko
Research Assistant, Psychology Department, Rutgers-Newark
Student  Ph.D. @ CS Dept. NJIT
Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
        101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
WWW:     http://www.linkedin.com/in/yarik        



More information about the Pkg-exppsy-maintainers mailing list