[Pkg-exppsy-maintainers] First round of SMLR working!
Per B. Sederberg
persed at princeton.edu
Mon Mar 3 21:08:18 UTC 2008
Here's my code for running the test.
P
On Mon, Mar 3, 2008 at 3:45 PM, Per B. Sederberg <persed at princeton.edu> wrote:
> I reverse engineered it based on the labels :)
>
> I've tested the overflow one and it works correctly on my machine.
> The strange thing is that, although the weights at the end are
> essentially the same, the python version runs for 5 more cycles before
> converging. Interesting...
>
> So, we also need to find out what is different between our two
> machines because it is definitely not overflowing on mine:
>
> cycle=354 ; incr=0.00100808 ; non_zero=133 ; sum2_w_old=15.6454
> cycle=355 ; incr=0.00100411 ; non_zero=133 ; sum2_w_old=15.6565
> cycle=356 ; incr=0.00100256 ; non_zero=133 ; sum2_w_old=15.6675
> cycle=357 ; incr=0.00100107 ; non_zero=133 ; sum2_w_old=15.6786
> cycle=358 ; incr=0.000999623 ; non_zero=133 ; sum2_w_old=15.6896
> [SMLR_] DBG: train finished in 358 cycles on
>
> data.shape=(90, 329) min:max(data)=-4.433080:
> 5.054645, got min:max(w)=-1.163411:0.734330
> [SMLR_] DBG: predict on data.shape=(18, 329)
> min:max(data)=-3.641074:4.151642 min:max(w)=-
> 1.163411:0.734330 min:max(dot_prod)=-6.964815:10.797821
> min:max(E)=0.000945:48914.125555
>
>
> Find me online when you get the chance.
>
> Latro,
> P
>
> On Mon, Mar 3, 2008 at 3:34 PM, Yaroslav Halchenko
>
>
> <debian at onerussian.com> wrote:
> > sorry -- I forgotten that it looses dimensionality. But it is in the log
> > lines:
> >
> > > > > > [SMLR_] DBG{ 'Mon Mar 3 14:44:41 2008' '0.053 sec' 'VmSize:\t 145244 kB'}: train finished in 631 cycles on data.shape=(90, 8) min:max(data)=-3.279930:3.305094, got min:max(w)=0.000000:0.000000
> >
> > so it is 90x8 (90 samples, 8 features) for spilover case
> >
> > and (90, 329) for overflow
> >
> >
> >
> > On Mon, 03 Mar 2008, Per B. Sederberg wrote:
> >
> > > fromfile loses dimensionality. What are the dimensions?
> >
> > > Are you not checking IRC?
> >
> > > P
> >
> > > On Mon, Mar 3, 2008 at 3:01 PM, Yaroslav Halchenko
> > > <debian at onerussian.com> wrote:
> > > > oops... sorry -- stored numpy's arrays .tofile, so you need
> > > > numpy.fromfile
> >
> >
> >
> > > > On Mon, 03 Mar 2008, Per B. Sederberg wrote:
> >
> > > > > What data format are those in (how do I read them in)? And you say
> > > > > they don't fail with the python version?
> >
> > > > > Thanks,
> > > > > P
> >
> > > > > On Mon, Mar 3, 2008 at 2:54 PM, Yaroslav Halchenko
> > > > > <debian at onerussian.com> wrote:
> > > > > > ok -- 2 datasets... in 1 we get 'spilover' effect:
> > > > > > cycle=624 ; incr=0.00157436 ; non_zero=8 ; sum2_w_old=77745.9
> > > > > > cycle=625 ; incr=0.00157188 ; non_zero=8 ; sum2_w_old=77990.9
> > > > > > cycle=626 ; incr=0.0015694 ; non_zero=8 ; sum2_w_old=78236.2
> > > > > > cycle=627 ; incr=0.00156693 ; non_zero=8 ; sum2_w_old=78482
> > > > > > cycle=628 ; incr=0.00156447 ; non_zero=8 ; sum2_w_old=78728.1
> > > > > > cycle=629 ; incr=inf ; non_zero=7 ; sum2_w_old=78974.7
> > > > > > cycle=630 ; incr=nan ; non_zero=0 ; sum2_w_old=inf
> > > > > > cycle=631 ; incr=0 ; non_zero=0 ; sum2_w_old=0
> > > > > > [SMLR_] DBG{ 'Mon Mar 3 14:44:41 2008' '0.053 sec' 'VmSize:\t 145244 kB'}: train finished in 631 cycles on data.shape=(90, 8) min:max(data)=-3.279930:3.305094, got min:max(w)=0.000000:0.000000
> >
> > > > > > and in another overflow of exponent:
> > > > > > cycle=994 ; incr=0.00100595 ; non_zero=328 ; sum2_w_old=3.93982e+07
> > > > > > cycle=995 ; incr=0.00100494 ; non_zero=328 ; sum2_w_old=3.94775e+07
> > > > > > cycle=996 ; incr=0.00100393 ; non_zero=328 ; sum2_w_old=3.95569e+07
> > > > > > cycle=997 ; incr=0.00100292 ; non_zero=328 ; sum2_w_old=3.96364e+07
> > > > > > cycle=998 ; incr=0.00100192 ; non_zero=328 ; sum2_w_old=3.97159e+07
> > > > > > cycle=999 ; incr=0.00100092 ; non_zero=328 ; sum2_w_old=3.97956e+07
> > > > > > cycle=1000 ; incr=0.000999915 ; non_zero=328 ; sum2_w_old=3.98753e+07
> > > > > > [SMLR_] DBG{ 'Mon Mar 3 14:50:09 2008' '2.176 sec' 'VmSize:\t 145540 kB'}: train finished in 1000 cycles on data.shape=(90, 329)
> > > > > > min:max(data)=-4.433080:5.054645, got min:max(w)=-962.589603:941.041508
> >
> >
> > > > > > Warning: overflow encountered in exp
> > > > > > both sets are available from
> > > > > > http://www.onerussian.com/Linux/bugs/smlr/ds/
> >
> > > > > > tr for training, te for testing ;)
> >
> >
> >
> >
> > > > > > On Mon, 03 Mar 2008, Yaroslav Halchenko wrote:
> >
> > > > > > > ok - I will spit out split/selection from dataset causes such
> > > > > > > behavior
> >
> > > > > > > this data is an artif dataset with few univariate 'relevant' features
> > > > > > > and the rest is noise... It has 512 features total...
> >
> > > > > > > On Mon, 03 Mar 2008, Per B. Sederberg wrote:
> >
> > > > > > > > Hi Yarik:
> >
> > > > > > > > Can you send me a dataset that causes it to fail (i.e., causes the
> > > > > > > > values to grow really big in the C version)? And why do you only have
> > > > > > > > 512 features to start? Have you already performed feature selection?
> >
> > > > > > > > Thanks,
> > > > > > > > Per
> >
> >
> > > > > > > > On Mon, Mar 3, 2008 at 11:56 AM, Yaroslav Halchenko
> > > > > > > > <debian at onerussian.com> wrote:
> > > > > > > > > I am about to push some minor changes into SMLR code (mostly producing
> > > > > > > > > debug output I think ;-))
> >
> > > > > > > > > as for overflow -- just checkout log file
> > > > > > > > > http://www.onerussian.com/Linux/bugs/smlr/overflow3.gz
> >
> > > > > > > > > you will see that some runs simply diverge, some do stop at some huge
> > > > > > > > > values of weights (which leads to overflow in predict). I am yet to spit
> > > > > > > > > out a problematic train/test case of interest.
> >
> > > > > > > > > BTw - it seems I didn't mention such a problem while running python
> > > > > > > > > implementation of SMLR...
> >
> > > > > > > > > so, 1 of unittests to go into test_smlr is actually to validate that
> > > > > > > > > results of runs of C and python implementations are close to each
> > > > > > > > > other...
> >
> > > > > > > > > Please let me know if anyone is working on SMLR code as of speaking so
> > > > > > > > > we don't cross the spairs while merging ;-)
> >
> >
> >
> > > > > > > > > On Mon, 03 Mar 2008, Per B. Sederberg wrote:
> >
> > > > > > > > > > On Mon, Mar 3, 2008 at 12:55 AM, Yaroslav Halchenko
> > > > > > > > > > <yoh at psychology.rutgers.edu> wrote:
> > > > > > > > > > > > I'm guessing it may be because you have no features left. Can you
> > > > > > > > > > > > narrow it down to a testcase that we can reproduce?
> > > > > > > > > > > will try although
> >
> >
> > > > > > > > > > Even if you just save out the weights from a run that gives you a
> > > > > > > > > > warning, that would help. You could also save the training and
> > > > > > > > > > testing data that went into that classifier.
> >
> >
> > > > > > > > > > > > > [CLF] DBG{ 'Sun Mar 2 23:47:45 2008' '2.659 sec' 'VmSize:\t 147072 kB'}: Predicting classifier SMLR(lm=0.100000, convergence_tol=0.001, maxiter=10000, implementation='C', enabled_states=['predictions', 'trained_labels']) on data (18, 442)
> > > > > > > > > > > says that there is 442 features. My guess is that actually not that I am
> > > > > > > > > > > left without features but that there is excess in their inner-product
> > > > > > > > > > > with weights w, thus it overflows in exp...
> >
> > > > > > > > > > The fact that it says 442 features means that was what was going into
> > > > > > > > > > training, right? Not that after training there were 442 non-zero
> > > > > > > > > > weights.
> >
> > > > > > > > > > Are you using smlr for feature selection and then using it for
> > > > > > > > > > classification as a separate step? If so, it may be better to use
> > > > > > > > > > ridge or standard logistic regression with a gaussian prior for the
> > > > > > > > > > classification because it has a better-suited regularization for
> > > > > > > > > > classifying when you know that the features are good.
> >
> > > > > > > > > > Let me know and more details and I'll try and get to the bottom of this.
> >
> > > > > > > > > > Thanks,
> > > > > > > > > > Per
> >
> >
> >
> > > > > > > > > > > > > Warning: overflow encountered in exp
> > > > > > > > > > > > > Warning: invalid value encountered in divide
> >
> >
> >
> >
> > > > > > > > > > > > > On Sun, 02 Mar 2008, Per B. Sederberg wrote:
> >
> > > > > > > > > > > > > > I didn't see any changes to test_smlr, did you push.
> >
> > > > > > > > > > > > > > Another thing to keep in mind as we go forward with SMLR, is that it
> > > > > > > > > > > > > > is a multi-class classifier (though I have not tested it.) When you
> > > > > > > > > > > > > > have multiple classes, the weight matrix will be M-1 by the number of
> > > > > > > > > > > > > > features where M is the number of classes.
> >
> > > > > > > > > > > > > > I think we'll want to do something like sum over the M-1 classes to
> > > > > > > > > > > > > > get the sensitivity for each feature.
> >
> > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > Per
> >
> >
> > > > > > > > > > > > > > On Sun, Mar 2, 2008 at 6:50 PM, Yaroslav Halchenko
> > > > > > > > > > > > > > <yoh at psychology.rutgers.edu> wrote:
> > > > > > > > > > > > > > > hey -- I wrote some notes in the test_smlr ... you might find them
> > > > > > > > > > > > > > > useful ;-)
> >
> >
> > > > > > > > > > > > > > > On Sun, 02 Mar 2008, Per B. Sederberg wrote:
> >
> >
> >
> > > > > > > > > > > > > > > > I have contacted the first author of the SMLR manuscript, and he sent
> > > > > > > > > > > > > > > > me some "rough" matlab code that he said has no copyrights and that I
> > > > > > > > > > > > > > > > could use any way that I wanted with no reference to him. He was
> > > > > > > > > > > > > > > > overjoyed that we were getting use out of it and that we could port it
> > > > > > > > > > > > > > > > to include in any closed or open-source project we wanted.
> >
> > > > > > > > > > > > > > > > So, I used his code as inspiration for the python code, which I then
> > > > > > > > > > > > > > > > ported to C.
> >
> > > > > > > > > > > > > > > > He was so awesome about it that we should be sure to cite him whenever
> > > > > > > > > > > > > > > > we use it.
> >
> > > > > > > > > > > > > > > > So, we are in the clear :)
> >
> > > > > > > > > > > > > > > > P
> >
> > > > > > > > > > > > > > > > On Sun, Mar 2, 2008 at 5:45 PM, Yaroslav Halchenko
> > > > > > > > > > > > > > > > <debian at onerussian.com> wrote:
> > > > > > > > > > > > > > > > > btw -- just to be sure
> >
> > > > > > > > > > > > > > > > > authors of SMLR original software release it under non-commercial
> > > > > > > > > > > > > > > > > license
> > > > > > > > > > > > > > > > > http://www.cs.duke.edu/~amink/software/smlr/
> > > > > > > > > > > > > > > > > Licensing Overview
> >
> > > > > > > > > > > > > > > > > You may license SMLR either under a non-commercial use license or under
> > > > > > > > > > > > > > > > > ....
> >
> > > > > > > > > > > > > > > > > but since Per coded it himself and there is no patent assigned to it
> > > > > > > > > > > > > > > > > (isn't there?) we are ok to release it within pymvpa, right?
> >
> > > > > > > > > > > > > > > > > --
> >
> > > > > > > > > > > > > > > > > Yaroslav Halchenko
> > > > > > > > > > > > > > > > > Research Assistant, Psychology Department, Rutgers-Newark
> > > > > > > > > > > > > > > > > Student Ph.D. @ CS Dept. NJIT
> > > > > > > > > > > > > > > > > Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
> > > > > > > > > > > > > > > > > 101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> > > > > > > > > > > > > > > > > WWW: http://www.linkedin.com/in/yarik
> >
> >
> >
> > > > > > > > > > > > > > > > > _______________________________________________
> > > > > > > > > > > > > > > > > Pkg-exppsy-maintainers mailing list
> > > > > > > > > > > > > > > > > Pkg-exppsy-maintainers at lists.alioth.debian.org
> > > > > > > > > > > > > > > > > http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-maintainers
> >
> >
> >
> > > > > > > > > > > > > > > --
> >
> >
> > > > > > > > > > > > > > > Yaroslav Halchenko
> > > > > > > > > > > > > > > Research Assistant, Psychology Department, Rutgers-Newark
> > > > > > > > > > > > > > > Student Ph.D. @ CS Dept. NJIT
> > > > > > > > > > > > > > > Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
> > > > > > > > > > > > > > > 101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> > > > > > > > > > > > > > > WWW: http://www.linkedin.com/in/yarik
> >
> >
> >
> > > > > > > > > > > > > --
> >
> >
> > > > > > > > > > > > > Yaroslav Halchenko
> > > > > > > > > > > > > Research Assistant, Psychology Department, Rutgers-Newark
> > > > > > > > > > > > > Student Ph.D. @ CS Dept. NJIT
> > > > > > > > > > > > > Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
> > > > > > > > > > > > > 101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> > > > > > > > > > > > > WWW: http://www.linkedin.com/in/yarik
> >
> >
> >
> > > > > > > > > > > --
> >
> >
> > > > > > > > > > > Yaroslav Halchenko
> > > > > > > > > > > Research Assistant, Psychology Department, Rutgers-Newark
> > > > > > > > > > > Student Ph.D. @ CS Dept. NJIT
> > > > > > > > > > > Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
> > > > > > > > > > > 101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> > > > > > > > > > > WWW: http://www.linkedin.com/in/yarik
> >
> >
> >
> > > > > > > > > --
> >
> >
> > > > > > > > > Yaroslav Halchenko
> > > > > > > > > Research Assistant, Psychology Department, Rutgers-Newark
> > > > > > > > > Student Ph.D. @ CS Dept. NJIT
> > > > > > > > > Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
> > > > > > > > > 101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> > > > > > > > > WWW: http://www.linkedin.com/in/yarik
> > > > > > --
> > > > > > Yaroslav Halchenko
> > > > > > Research Assistant, Psychology Department, Rutgers-Newark
> > > > > > Student Ph.D. @ CS Dept. NJIT
> > > > > > Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
> > > > > > 101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> > > > > > WWW: http://www.linkedin.com/in/yarik
> >
> > > > > > _______________________________________________
> > > > > > Pkg-exppsy-maintainers mailing list
> > > > > > Pkg-exppsy-maintainers at lists.alioth.debian.org
> > > > > > http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-maintainers
> >
> >
> >
> > > > --
> >
> >
> > > > Yaroslav Halchenko
> > > > Research Assistant, Psychology Department, Rutgers-Newark
> > > > Student Ph.D. @ CS Dept. NJIT
> > > > Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
> > > > 101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> > > > WWW: http://www.linkedin.com/in/yarik
> >
> > > > _______________________________________________
> > > > Pkg-exppsy-maintainers mailing list
> > > > Pkg-exppsy-maintainers at lists.alioth.debian.org
> > > > http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-maintainers
> >
> >
> >
> > --
> >
> >
> > Yaroslav Halchenko
> > Research Assistant, Psychology Department, Rutgers-Newark
> > Student Ph.D. @ CS Dept. NJIT
> > Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
> > 101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> > WWW: http://www.linkedin.com/in/yarik
> >
> > _______________________________________________
> > Pkg-exppsy-maintainers mailing list
> > Pkg-exppsy-maintainers at lists.alioth.debian.org
> > http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-maintainers
> >
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: runtest.py
Type: text/x-python
Size: 752 bytes
Desc: not available
Url : http://lists.alioth.debian.org/pipermail/pkg-exppsy-maintainers/attachments/20080303/e9b3d5b6/attachment-0001.py
More information about the Pkg-exppsy-maintainers
mailing list