[Pkg-exppsy-maintainers] SVM+RFE vs. SMLR

Thu Mar 6 17:40:13 UTC 2008

The current version may actually be wrong, but it will be a parameter
(with a good default value) so that you can control the amount of
decay on the probability of resampling zero-weights.

P

On Thu, Mar 6, 2008 at 12:22 PM, Yaroslav Halchenko
<debian at onerussian.com> wrote:
> cool!
>
>  since it leads to slowness could that be the option? or it is too
>  intrusive?
>
>
>
>  On Thu, 06 Mar 2008, Per B. Sederberg wrote:
>
>  > I'm in the process of modifying the SMLR code to resample the zero
>  > weights a bit more because the current method can give rise to too
>  > much variation due to sparse random resampling.  This will slightly
>  > slow the code and take up more RAM because I'm calculating it on a
>  > per-weight basis, but should be far more accurate.
>
>  > I'll email when I push the new code.
>
>  > Best,
>  > Per
>
>  > On Thu, Mar 6, 2008 at 11:27 AM, Yaroslav Halchenko
>  > <debian at onerussian.com> wrote:
>  > > Plenty but just now found that I screwed up a bit: min error in the
>  > >  labels in minimal not across means across runs - but global min, thus is
>  > >  not what we see in the plots
>
>  > >  Also damn legend covered some part but those are not interesting ones
>
>  > >  Find all subjects plots for those SMLRs
>  > >  http://www.onerussian.com/Sci/analysis/pymvpa/smlrs1/
>
>  > >  as I see ml=0.01 doesn't do good
>
>  > >  going to try it with lm=1.5 and all 1e-5 (on all lms)
>
>  > >  Another tiny new bit:
>  > >  As you could see in git's yoh/master there is now
>  > >  doc/examples/clfs_examples.py (could be renamed) the main purpose of
>  > >  which is to serve an extended version of smlr_example benchmark
>
>  > >  For now I just added only very dummy datasets (so avg train/test times are
>  > >  kinda bogus and denominated ;-)) and basic classifiers, and here is
>  > >  current output
>
>  > >  $> doc/examples/clfs_examples.py
>  > >  Dummy 2-class univariate with 2 useful features: <Dataset / float64 20 x 1000 uniq: 2 labels 5 chunks>
>  > >   Linear C-SVM (default)        : correct=65.0% train:0.0sec predict:0.0sec
>  > >   Linear nu-SVM (default)       : correct=65.0% train:0.0sec predict:0.0sec
>  > >   SMLR(default)                 : correct=90.0% train:0.1sec predict:0.0sec
>  > >   SMLR(Python)                  : correct=90.0% train:7.8sec predict:0.0sec
>  > >   RidgeReg(default)             : correct=50.0% train:6.8sec predict:0.0sec
>  > >   Rbf C-SVM (default)           : correct=60.0% train:0.0sec predict:0.0sec
>  > >   Rbf nu-SVM (default)          : correct=65.0% train:0.1sec predict:0.0sec
>  > >   kNN(default)                  : correct=55.0% train:0.0sec predict:0.0sec
>  > >  Dummy XOR-pattern: <Dataset / float64 80 x 2 uniq: 2 labels 80 chunks>
>  > >   Linear C-SVM (default)        : correct=0.0% train:0.0sec predict:0.0sec
>  > >   Linear nu-SVM (default)       : correct=71.2% train:0.0sec predict:0.0sec
>  > >   SMLR(default)                 : correct=0.0% train:0.0sec predict:0.0sec
>  > >   SMLR(Python)                  : correct=0.0% train:0.0sec predict:0.0sec
>  > >   RidgeReg(default)             : correct=50.0% train:0.0sec predict:0.0sec
>  > >   Rbf C-SVM (default)           : correct=0.0% train:0.0sec predict:0.0sec
>  > >   Rbf nu-SVM (default)          : correct=97.5% train:0.0sec predict:0.0sec
>  > >   kNN(default)                  : correct=98.8% train:0.0sec predict:0.0sec
>
>
>  > >  The goal is to extend with interesting data and evolved SVMs (ie SVM + RFE for
>  > >  instance, or SVM + feature selection based on ANOVA/ SMLR's weights/SVM weights
>  > >  but without RFE -- just plain non-0 or 1% of highest weights). That should
>  > >  provide illustrative example of built-in ML techniques in hands we have here,
>  > >  and provide easy assessment of efficiency in terms of computation time.
>
>
>
>  > >  On Thu, 06 Mar 2008, Per B. Sederberg wrote:
>
>  > >  > So, do you have the new results?
>  > >  > P
>
>  > >  > On Wed, Mar 5, 2008 at 3:59 PM, Yaroslav Halchenko
>  > >  > <debian at onerussian.com> wrote:
>  > >  > > > pulling these "relevant" weights away.  So, unless you are redoing the
>  > >  > >  > regression at each step, removing those features completely, you are
>  > >  > >  > actually punishing the SMLR each time you pull out weights that it
>  > >  > >  > thinks are valuable.
>  > >  > >  but we do retrain after each such feature removal,
>  > >  > >  ie we don't simply prune the weight (set it to 0), we remove that
>  > >  > >  feature from training data for a classifier, then retrain classifier.
>
>  > >  > >  so we do fair job imho ;-) or have I misread your message?
>
>  > >  > >  --
>  > >  > >  Yaroslav Halchenko
>  > >  > >  Research Assistant, Psychology Department, Rutgers-Newark
>  > >  > >  Student  Ph.D. @ CS Dept. NJIT
>  > >  > >  Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
>  > >  > >         101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
>  > >  > >  WWW:     http://www.linkedin.com/in/yarik
>
>  > >  > >  _______________________________________________
>  > >  > >  Pkg-exppsy-maintainers mailing list
>  > >  > >  Pkg-exppsy-maintainers at lists.alioth.debian.org
>  > >  > >  http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-maintainers
>
>
>
>  > >  --
>
>
>  > > Yaroslav Halchenko
>  > >  Research Assistant, Psychology Department, Rutgers-Newark
>  > >  Student  Ph.D. @ CS Dept. NJIT
>  > >  Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
>  > >         101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
>  > >  WWW:     http://www.linkedin.com/in/yarik
>
>
>
>  --
>
>
> Yaroslav Halchenko
>  Research Assistant, Psychology Department, Rutgers-Newark
>  Student  Ph.D. @ CS Dept. NJIT
>  Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
>         101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
>  WWW:     http://www.linkedin.com/in/yarik
>