[Pkg-exppsy-maintainers] Q: Crossvalidation feature selection

Thu Dec 20 15:50:48 UTC 2007

> It's just an example, but one of the things I'm doing first is trying
> to replicate some basic classification results that I performed with
> the matlab mvpa.  For those analyses I found that approx 1000 voxels
> gave good classification, so I'd like to try to classify with
> something similar.  I totally agree that the ideal mechanism is to use
> some form of RFE.
yeah -- we should get FixNFeatureSelection(FeatureSelection) I guess ;-)
It is trivial but has to be done ;-)

> OK, I can certainly do that loop and keep track of the results myself.
>  I just thought some version may already be there (and I see it may be
> soon :))
yeah - should be ;-)

> I'm working on an analysis right now that could easily be generalized
> into an example once this classifier is in there.  Let me know when
> you want me to try it out.
I am convoluted with other work but may be will craft it today... I will
drop a note...

Michael -- we should setup sending of announcements on git commits ;-)

> If I do this for a CV, will it actually return N-folds predictions and
> values in the results?
You can get a lot of stuff even now ;-)
ClfCrossValidation has states such as confusion and confusions (so you
can get total confusion matrix and each one per split). I just added
.sets property to ConfusionMatrix, so you can get raw 'target',
'prediction' pairs from there.

But also you can instruct ClfCrossValidation to store state transerrors,
which in turn would store whole classifiers, which can have 'values',
'predictions' stored in them as well...

So multiple ways to get there ;-)

> So if I set the MVPA_VERBOSE value it will print more to the screen?
> I'll try it out :)
not much since as I said  -- this is primarily to be used in your
scripts ;-)
mvpa itself using MVPA_DEBUG -- try that one

> > > 4) I'm training on subsets of a recall period, but it would be great
> > > to test on every sample from the left-out chunk, returning the
> > > predictions and values for each sample.
> > I am not clear what you mean... can you provide an examples listing
> > which samples you train on, which you transfer to?

> Let's say I have 6 runs of data.  I'm actually only training and
> testing (via N-Fold cross validation) a subset of the TRs for each
> run.  However, for a single cross validation fold, I sometimes like to
> take the classifier trained on the selected TRs from the 5 training
> runs and then test the output of the classifier on every TR in the
> testing run.  This is not to test accuracy, but to make a cool
> visualization of the classifier over time and to see how it
> generalized to other parts of the run.

> A specific thing that I have done in the past is to train a classifier
> to distinguish between semantic versus episodic memory retrieval and
> then I tested it on TRs where someone was performing a math task.
> This was a great control because the classifier was at chance for
> predicting math TRs, but was able to distinguish when people were
> actually performing retrievals.

> I know how to do this if I'm performing the cross validation myself,
> but it might be cool to eventually be able to test a classifer a
> different subset of TRs than those used to train during cross
> validation and then return the prediction values.

let me digest it some time later ;-)

-- 
Yaroslav Halchenko
Research Assistant, Psychology Department, Rutgers-Newark
Student  Ph.D. @ CS Dept. NJIT
Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
        101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
WWW:     http://www.linkedin.com/in/yarik