[Pkg-exppsy-maintainers] SVM+RFE vs. SMLR
Per B. Sederberg
persed at princeton.edu
Thu Mar 6 17:40:13 UTC 2008
The current version may actually be wrong, but it will be a parameter
(with a good default value) so that you can control the amount of
decay on the probability of resampling zero-weights.
P
On Thu, Mar 6, 2008 at 12:22 PM, Yaroslav Halchenko
<debian at onerussian.com> wrote:
> cool!
>
> since it leads to slowness could that be the option? or it is too
> intrusive?
>
>
>
> On Thu, 06 Mar 2008, Per B. Sederberg wrote:
>
> > I'm in the process of modifying the SMLR code to resample the zero
> > weights a bit more because the current method can give rise to too
> > much variation due to sparse random resampling. This will slightly
> > slow the code and take up more RAM because I'm calculating it on a
> > per-weight basis, but should be far more accurate.
>
> > I'll email when I push the new code.
>
> > Best,
> > Per
>
> > On Thu, Mar 6, 2008 at 11:27 AM, Yaroslav Halchenko
> > <debian at onerussian.com> wrote:
> > > Plenty but just now found that I screwed up a bit: min error in the
> > > labels in minimal not across means across runs - but global min, thus is
> > > not what we see in the plots
>
> > > Also damn legend covered some part but those are not interesting ones
>
> > > Find all subjects plots for those SMLRs
> > > http://www.onerussian.com/Sci/analysis/pymvpa/smlrs1/
>
> > > as I see ml=0.01 doesn't do good
>
> > > going to try it with lm=1.5 and all 1e-5 (on all lms)
>
> > > Another tiny new bit:
> > > As you could see in git's yoh/master there is now
> > > doc/examples/clfs_examples.py (could be renamed) the main purpose of
> > > which is to serve an extended version of smlr_example benchmark
>
> > > For now I just added only very dummy datasets (so avg train/test times are
> > > kinda bogus and denominated ;-)) and basic classifiers, and here is
> > > current output
>
> > > $> doc/examples/clfs_examples.py
> > > Dummy 2-class univariate with 2 useful features: <Dataset / float64 20 x 1000 uniq: 2 labels 5 chunks>
> > > Linear C-SVM (default) : correct=65.0% train:0.0sec predict:0.0sec
> > > Linear nu-SVM (default) : correct=65.0% train:0.0sec predict:0.0sec
> > > SMLR(default) : correct=90.0% train:0.1sec predict:0.0sec
> > > SMLR(Python) : correct=90.0% train:7.8sec predict:0.0sec
> > > RidgeReg(default) : correct=50.0% train:6.8sec predict:0.0sec
> > > Rbf C-SVM (default) : correct=60.0% train:0.0sec predict:0.0sec
> > > Rbf nu-SVM (default) : correct=65.0% train:0.1sec predict:0.0sec
> > > kNN(default) : correct=55.0% train:0.0sec predict:0.0sec
> > > Dummy XOR-pattern: <Dataset / float64 80 x 2 uniq: 2 labels 80 chunks>
> > > Linear C-SVM (default) : correct=0.0% train:0.0sec predict:0.0sec
> > > Linear nu-SVM (default) : correct=71.2% train:0.0sec predict:0.0sec
> > > SMLR(default) : correct=0.0% train:0.0sec predict:0.0sec
> > > SMLR(Python) : correct=0.0% train:0.0sec predict:0.0sec
> > > RidgeReg(default) : correct=50.0% train:0.0sec predict:0.0sec
> > > Rbf C-SVM (default) : correct=0.0% train:0.0sec predict:0.0sec
> > > Rbf nu-SVM (default) : correct=97.5% train:0.0sec predict:0.0sec
> > > kNN(default) : correct=98.8% train:0.0sec predict:0.0sec
>
>
> > > The goal is to extend with interesting data and evolved SVMs (ie SVM + RFE for
> > > instance, or SVM + feature selection based on ANOVA/ SMLR's weights/SVM weights
> > > but without RFE -- just plain non-0 or 1% of highest weights). That should
> > > provide illustrative example of built-in ML techniques in hands we have here,
> > > and provide easy assessment of efficiency in terms of computation time.
>
>
>
> > > On Thu, 06 Mar 2008, Per B. Sederberg wrote:
>
> > > > So, do you have the new results?
> > > > P
>
> > > > On Wed, Mar 5, 2008 at 3:59 PM, Yaroslav Halchenko
> > > > <debian at onerussian.com> wrote:
> > > > > > pulling these "relevant" weights away. So, unless you are redoing the
> > > > > > regression at each step, removing those features completely, you are
> > > > > > actually punishing the SMLR each time you pull out weights that it
> > > > > > thinks are valuable.
> > > > > but we do retrain after each such feature removal,
> > > > > ie we don't simply prune the weight (set it to 0), we remove that
> > > > > feature from training data for a classifier, then retrain classifier.
>
> > > > > so we do fair job imho ;-) or have I misread your message?
>
> > > > > --
> > > > > Yaroslav Halchenko
> > > > > Research Assistant, Psychology Department, Rutgers-Newark
> > > > > Student Ph.D. @ CS Dept. NJIT
> > > > > Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
> > > > > 101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> > > > > WWW: http://www.linkedin.com/in/yarik
>
> > > > > _______________________________________________
> > > > > Pkg-exppsy-maintainers mailing list
> > > > > Pkg-exppsy-maintainers at lists.alioth.debian.org
> > > > > http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-maintainers
>
>
>
> > > --
>
>
> > > Yaroslav Halchenko
> > > Research Assistant, Psychology Department, Rutgers-Newark
> > > Student Ph.D. @ CS Dept. NJIT
> > > Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
> > > 101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> > > WWW: http://www.linkedin.com/in/yarik
>
>
>
> --
>
>
> Yaroslav Halchenko
> Research Assistant, Psychology Department, Rutgers-Newark
> Student Ph.D. @ CS Dept. NJIT
> Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
> 101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> WWW: http://www.linkedin.com/in/yarik
>
More information about the Pkg-exppsy-maintainers
mailing list