[Pkg-exppsy-maintainers] First round of SMLR working!

Mon Mar 3 20:45:13 UTC 2008

I reverse engineered it based on the labels :)

I've tested the overflow one and it works correctly on my machine.
The strange thing is that, although the weights at the end are
essentially the same, the python version runs for 5 more cycles before
converging.  Interesting...

So, we also need to find out what is different between our two
machines because it is definitely not overflowing on mine:

cycle=354 ; incr=0.00100808 ; non_zero=133 ; sum2_w_old=15.6454
cycle=355 ; incr=0.00100411 ; non_zero=133 ; sum2_w_old=15.6565
cycle=356 ; incr=0.00100256 ; non_zero=133 ; sum2_w_old=15.6675
cycle=357 ; incr=0.00100107 ; non_zero=133 ; sum2_w_old=15.6786
cycle=358 ; incr=0.000999623 ; non_zero=133 ; sum2_w_old=15.6896
[SMLR_] DBG:               train finished in 358 cycles on
data.shape=(90, 329) min:max(data)=-4.433080:
5.054645, got min:max(w)=-1.163411:0.734330
[SMLR_] DBG:               predict on data.shape=(18, 329)
min:max(data)=-3.641074:4.151642 min:max(w)=-
1.163411:0.734330 min:max(dot_prod)=-6.964815:10.797821
min:max(E)=0.000945:48914.125555

Find me online when you get the chance.

Latro,
P

On Mon, Mar 3, 2008 at 3:34 PM, Yaroslav Halchenko
<debian at onerussian.com> wrote:
> sorry -- I forgotten that it looses dimensionality. But it is in the log
>  lines:
>
> > >  > >  [SMLR_] DBG{ 'Mon Mar  3 14:44:41 2008' '0.053 sec' 'VmSize:\t  145244 kB'}:       train finished in 631 cycles on data.shape=(90, 8) min:max(data)=-3.279930:3.305094, got min:max(w)=0.000000:0.000000
>
>  so it is 90x8 (90 samples, 8 features) for spilover case
>
>  and  (90, 329) for overflow
>
>
>
>  On Mon, 03 Mar 2008, Per B. Sederberg wrote:
>
>  > fromfile loses dimensionality.  What are the dimensions?
>
>  > Are you not checking IRC?
>
>  > P
>
>  > On Mon, Mar 3, 2008 at 3:01 PM, Yaroslav Halchenko
>  > <debian at onerussian.com> wrote:
>  > > oops... sorry -- stored numpy's arrays .tofile, so you need
>  > >  numpy.fromfile
>
>
>
>  > >  On Mon, 03 Mar 2008, Per B. Sederberg wrote:
>
>  > >  > What data format are those in (how do I read them in)?  And you say
>  > >  > they don't fail with the python version?
>
>  > >  > Thanks,
>  > >  > P
>
>  > >  > On Mon, Mar 3, 2008 at 2:54 PM, Yaroslav Halchenko
>  > >  > <debian at onerussian.com> wrote:
>  > >  > > ok -- 2 datasets... in 1 we get 'spilover' effect:
>  > >  > >  cycle=624 ; incr=0.00157436 ; non_zero=8 ; sum2_w_old=77745.9
>  > >  > >  cycle=625 ; incr=0.00157188 ; non_zero=8 ; sum2_w_old=77990.9
>  > >  > >  cycle=626 ; incr=0.0015694 ; non_zero=8 ; sum2_w_old=78236.2
>  > >  > >  cycle=627 ; incr=0.00156693 ; non_zero=8 ; sum2_w_old=78482
>  > >  > >  cycle=628 ; incr=0.00156447 ; non_zero=8 ; sum2_w_old=78728.1
>  > >  > >  cycle=629 ; incr=inf ; non_zero=7 ; sum2_w_old=78974.7
>  > >  > >  cycle=630 ; incr=nan ; non_zero=0 ; sum2_w_old=inf
>  > >  > >  cycle=631 ; incr=0 ; non_zero=0 ; sum2_w_old=0
>  > >  > >  [SMLR_] DBG{ 'Mon Mar  3 14:44:41 2008' '0.053 sec' 'VmSize:\t  145244 kB'}:       train finished in 631 cycles on data.shape=(90, 8) min:max(data)=-3.279930:3.305094, got min:max(w)=0.000000:0.000000
>
>  > >  > >  and in another overflow of exponent:
>  > >  > >  cycle=994 ; incr=0.00100595 ; non_zero=328 ; sum2_w_old=3.93982e+07
>  > >  > >  cycle=995 ; incr=0.00100494 ; non_zero=328 ; sum2_w_old=3.94775e+07
>  > >  > >  cycle=996 ; incr=0.00100393 ; non_zero=328 ; sum2_w_old=3.95569e+07
>  > >  > >  cycle=997 ; incr=0.00100292 ; non_zero=328 ; sum2_w_old=3.96364e+07
>  > >  > >  cycle=998 ; incr=0.00100192 ; non_zero=328 ; sum2_w_old=3.97159e+07
>  > >  > >  cycle=999 ; incr=0.00100092 ; non_zero=328 ; sum2_w_old=3.97956e+07
>  > >  > >  cycle=1000 ; incr=0.000999915 ; non_zero=328 ; sum2_w_old=3.98753e+07
>  > >  > >  [SMLR_] DBG{ 'Mon Mar  3 14:50:09 2008' '2.176 sec' 'VmSize:\t  145540 kB'}:       train finished in 1000 cycles on data.shape=(90, 329)
>  > >  > >  min:max(data)=-4.433080:5.054645, got min:max(w)=-962.589603:941.041508
>
>
>  > >  > >  Warning: overflow encountered in exp
>  > >  > >  both sets are available from
>  > >  > >  http://www.onerussian.com/Linux/bugs/smlr/ds/
>
>  > >  > >  tr for training, te for testing ;)
>
>
>
>
>  > >  > >  On Mon, 03 Mar 2008, Yaroslav Halchenko wrote:
>
>  > >  > >  > ok - I will spit out split/selection from dataset causes such
>  > >  > >  > behavior
>
>  > >  > >  > this data is an artif dataset with few univariate 'relevant' features
>  > >  > >  > and the rest is noise... It has 512 features total...
>
>  > >  > >  > On Mon, 03 Mar 2008, Per B. Sederberg wrote:
>
>  > >  > >  > > Hi Yarik:
>
>  > >  > >  > > Can you send me a dataset that causes it to fail (i.e., causes the
>  > >  > >  > > values to grow really big in the C version)?  And why do you only have
>  > >  > >  > > 512 features to start?  Have you already performed feature selection?
>
>  > >  > >  > > Thanks,
>  > >  > >  > > Per
>
>
>  > >  > >  > > On Mon, Mar 3, 2008 at 11:56 AM, Yaroslav Halchenko
>  > >  > >  > > <debian at onerussian.com> wrote:
>  > >  > >  > > > I am about to push some minor changes into SMLR code (mostly producing
>  > >  > >  > > >  debug output I think ;-))
>
>  > >  > >  > > >  as for overflow -- just checkout log file
>  > >  > >  > > >  http://www.onerussian.com/Linux/bugs/smlr/overflow3.gz
>
>  > >  > >  > > >  you will see that some runs simply diverge, some do stop at some huge
>  > >  > >  > > >  values of weights (which leads to overflow in predict). I am yet to spit
>  > >  > >  > > >  out a problematic train/test case of interest.
>
>  > >  > >  > > >  BTw - it seems I didn't mention such a problem while running python
>  > >  > >  > > >  implementation of SMLR...
>
>  > >  > >  > > >  so, 1 of unittests to go into test_smlr is actually to validate that
>  > >  > >  > > >  results of runs of C and python implementations are close to each
>  > >  > >  > > >  other...
>
>  > >  > >  > > >  Please let me know if anyone is working on SMLR code as  of speaking so
>  > >  > >  > > >  we don't cross the spairs while merging ;-)
>
>
>
>  > >  > >  > > >  On Mon, 03 Mar 2008, Per B. Sederberg wrote:
>
>  > >  > >  > > >  > On Mon, Mar 3, 2008 at 12:55 AM, Yaroslav Halchenko
>  > >  > >  > > >  > <yoh at psychology.rutgers.edu> wrote:
>  > >  > >  > > >  > > > I'm guessing it may be because you have no features left.  Can you
>  > >  > >  > > >  > >  > narrow it down to a testcase that we can reproduce?
>  > >  > >  > > >  > >  will try although
>
>
>  > >  > >  > > >  > Even if you just save out the weights from a run that gives you a
>  > >  > >  > > >  > warning, that would help.  You could also save the training and
>  > >  > >  > > >  > testing data that went into that classifier.
>
>
>  > >  > >  > > >  > >  > >  [CLF] DBG{ 'Sun Mar  2 23:47:45 2008' '2.659 sec' 'VmSize:\t  147072 kB'}:      Predicting classifier SMLR(lm=0.100000, convergence_tol=0.001, maxiter=10000, implementation='C', enabled_states=['predictions', 'trained_labels']) on data (18, 442)
>  > >  > >  > > >  > >  says that there is 442 features. My guess is that actually not that I am
>  > >  > >  > > >  > >  left without features but that there is excess in their inner-product
>  > >  > >  > > >  > >  with weights w, thus it overflows in exp...
>
>  > >  > >  > > >  > The fact that it says 442 features means that was what was going into
>  > >  > >  > > >  > training, right?  Not that after training there were 442 non-zero
>  > >  > >  > > >  > weights.
>
>  > >  > >  > > >  > Are you using smlr for feature selection and then using it for
>  > >  > >  > > >  > classification as a separate step?  If so, it may be better to use
>  > >  > >  > > >  > ridge or standard logistic regression with a gaussian prior for the
>  > >  > >  > > >  > classification because it has a better-suited regularization for
>  > >  > >  > > >  > classifying when you know that the features are good.
>
>  > >  > >  > > >  > Let me know and more details and I'll try and get to the bottom of this.
>
>  > >  > >  > > >  > Thanks,
>  > >  > >  > > >  > Per
>
>
>
>  > >  > >  > > >  > > > >  Warning: overflow encountered in exp
>  > >  > >  > > >  > >  > >  Warning: invalid value encountered in divide
>
>
>
>
>  > >  > >  > > >  > >  > >  On Sun, 02 Mar 2008, Per B. Sederberg wrote:
>
>  > >  > >  > > >  > >  > >  > I didn't see any changes to test_smlr, did you push.
>
>  > >  > >  > > >  > >  > >  > Another thing to keep in mind as we go forward with SMLR, is that it
>  > >  > >  > > >  > >  > >  > is a multi-class classifier (though I have not tested it.)  When you
>  > >  > >  > > >  > >  > >  > have multiple classes, the weight matrix will be M-1 by the number of
>  > >  > >  > > >  > >  > >  > features where M is the number of classes.
>
>  > >  > >  > > >  > >  > >  > I think we'll want to do something like sum over the M-1 classes to
>  > >  > >  > > >  > >  > >  > get the sensitivity for each feature.
>
>  > >  > >  > > >  > >  > >  > Best,
>  > >  > >  > > >  > >  > >  > Per
>
>
>  > >  > >  > > >  > >  > >  > On Sun, Mar 2, 2008 at 6:50 PM, Yaroslav Halchenko
>  > >  > >  > > >  > >  > >  > <yoh at psychology.rutgers.edu> wrote:
>  > >  > >  > > >  > >  > >  > > hey -- I wrote some notes in the test_smlr ... you might find them
>  > >  > >  > > >  > >  > >  > >  useful ;-)
>
>
>  > >  > >  > > >  > >  > >  > >  On Sun, 02 Mar 2008, Per B. Sederberg wrote:
>
>
>
>  > >  > >  > > >  > >  > >  > > > I have contacted the first author of the SMLR manuscript, and he sent
>  > >  > >  > > >  > >  > >  > >  > me some "rough" matlab code that he said has no copyrights and that I
>  > >  > >  > > >  > >  > >  > >  > could use any way that I wanted with no reference to him.  He was
>  > >  > >  > > >  > >  > >  > >  > overjoyed that we were getting use out of it and that we could port it
>  > >  > >  > > >  > >  > >  > >  > to include in any closed or open-source project we wanted.
>
>  > >  > >  > > >  > >  > >  > >  > So, I used his code as inspiration for the python code, which I then
>  > >  > >  > > >  > >  > >  > >  > ported to C.
>
>  > >  > >  > > >  > >  > >  > >  > He was so awesome about it that we should be sure to cite him whenever
>  > >  > >  > > >  > >  > >  > >  > we use it.
>
>  > >  > >  > > >  > >  > >  > >  > So, we are in the clear :)
>
>  > >  > >  > > >  > >  > >  > >  > P
>
>  > >  > >  > > >  > >  > >  > >  > On Sun, Mar 2, 2008 at 5:45 PM, Yaroslav Halchenko
>  > >  > >  > > >  > >  > >  > >  > <debian at onerussian.com> wrote:
>  > >  > >  > > >  > >  > >  > >  > > btw -- just to be sure
>
>  > >  > >  > > >  > >  > >  > >  > >  authors of SMLR original software release it under non-commercial
>  > >  > >  > > >  > >  > >  > >  > >  license
>  > >  > >  > > >  > >  > >  > >  > >  http://www.cs.duke.edu/~amink/software/smlr/
>  > >  > >  > > >  > >  > >  > >  > >  Licensing Overview
>
>  > >  > >  > > >  > >  > >  > >  > >  You may license SMLR either under a non-commercial use license or under
>  > >  > >  > > >  > >  > >  > >  > >  ....
>
>  > >  > >  > > >  > >  > >  > >  > >  but since Per coded it himself and there is no patent assigned to it
>  > >  > >  > > >  > >  > >  > >  > >  (isn't there?) we are ok to release it within pymvpa, right?
>
>  > >  > >  > > >  > >  > >  > >  > >  --
>
>  > >  > >  > > >  > >  > >  > >  > > Yaroslav Halchenko
>  > >  > >  > > >  > >  > >  > >  > >  Research Assistant, Psychology Department, Rutgers-Newark
>  > >  > >  > > >  > >  > >  > >  > >  Student  Ph.D. @ CS Dept. NJIT
>  > >  > >  > > >  > >  > >  > >  > >  Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
>  > >  > >  > > >  > >  > >  > >  > >         101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
>  > >  > >  > > >  > >  > >  > >  > >  WWW:     http://www.linkedin.com/in/yarik
>
>
>
>  > >  > >  > > >  > >  > >  > >  > > _______________________________________________
>  > >  > >  > > >  > >  > >  > >  > >  Pkg-exppsy-maintainers mailing list
>  > >  > >  > > >  > >  > >  > >  > >  Pkg-exppsy-maintainers at lists.alioth.debian.org
>  > >  > >  > > >  > >  > >  > >  > >  http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-maintainers
>
>
>
>  > >  > >  > > >  > >  > >  > >  --
>
>
>  > >  > >  > > >  > >  > >  > > Yaroslav Halchenko
>  > >  > >  > > >  > >  > >  > >  Research Assistant, Psychology Department, Rutgers-Newark
>  > >  > >  > > >  > >  > >  > >  Student  Ph.D. @ CS Dept. NJIT
>  > >  > >  > > >  > >  > >  > >  Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
>  > >  > >  > > >  > >  > >  > >         101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
>  > >  > >  > > >  > >  > >  > >  WWW:     http://www.linkedin.com/in/yarik
>
>
>
>  > >  > >  > > >  > >  > >  --
>
>
>  > >  > >  > > >  > >  > > Yaroslav Halchenko
>  > >  > >  > > >  > >  > >  Research Assistant, Psychology Department, Rutgers-Newark
>  > >  > >  > > >  > >  > >  Student  Ph.D. @ CS Dept. NJIT
>  > >  > >  > > >  > >  > >  Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
>  > >  > >  > > >  > >  > >         101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
>  > >  > >  > > >  > >  > >  WWW:     http://www.linkedin.com/in/yarik
>
>
>
>  > >  > >  > > >  > >  --
>
>
>  > >  > >  > > >  > > Yaroslav Halchenko
>  > >  > >  > > >  > >  Research Assistant, Psychology Department, Rutgers-Newark
>  > >  > >  > > >  > >  Student  Ph.D. @ CS Dept. NJIT
>  > >  > >  > > >  > >  Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
>  > >  > >  > > >  > >         101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
>  > >  > >  > > >  > >  WWW:     http://www.linkedin.com/in/yarik
>
>
>
>  > >  > >  > > >  --
>
>
>  > >  > >  > > > Yaroslav Halchenko
>  > >  > >  > > >  Research Assistant, Psychology Department, Rutgers-Newark
>  > >  > >  > > >  Student  Ph.D. @ CS Dept. NJIT
>  > >  > >  > > >  Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
>  > >  > >  > > >         101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
>  > >  > >  > > >  WWW:     http://www.linkedin.com/in/yarik
>  > >  > >  --
>  > >  > >  Yaroslav Halchenko
>  > >  > >  Research Assistant, Psychology Department, Rutgers-Newark
>  > >  > >  Student  Ph.D. @ CS Dept. NJIT
>  > >  > >  Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
>  > >  > >         101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
>  > >  > >  WWW:     http://www.linkedin.com/in/yarik
>
>  > >  > >  _______________________________________________
>  > >  > >  Pkg-exppsy-maintainers mailing list
>  > >  > >  Pkg-exppsy-maintainers at lists.alioth.debian.org
>  > >  > >  http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-maintainers
>
>
>
>  > >  --
>
>
>  > > Yaroslav Halchenko
>  > >  Research Assistant, Psychology Department, Rutgers-Newark
>  > >  Student  Ph.D. @ CS Dept. NJIT
>  > >  Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
>  > >         101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
>  > >  WWW:     http://www.linkedin.com/in/yarik
>
>  > >  _______________________________________________
>  > >  Pkg-exppsy-maintainers mailing list
>  > >  Pkg-exppsy-maintainers at lists.alioth.debian.org
>  > >  http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-maintainers
>
>
>
>  --
>
>
> Yaroslav Halchenko
>  Research Assistant, Psychology Department, Rutgers-Newark
>  Student  Ph.D. @ CS Dept. NJIT
>  Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
>         101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
>  WWW:     http://www.linkedin.com/in/yarik
>
>  _______________________________________________
>  Pkg-exppsy-maintainers mailing list
>  Pkg-exppsy-maintainers at lists.alioth.debian.org
>  http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-maintainers
>