[Pkg-exppsy-maintainers] Q: Crossvalidation feature selection
Yaroslav Halchenko
debian at onerussian.com
Thu Dec 20 15:00:54 UTC 2007
Let me try to give brief answers ;-)
> Wow! I'm totally amazed with the amazing work you have done on
> pymvpa. I just started using it for real tonight and I'm quite happy.
That is great ;-)
> 1) I'd like to run feature selection (I'm happy to use anything in
> there, such as the ANOVA or something fancier) on each training set of
> a N-Fold cross validation run. I'd also like to save the mask of
> those features for later analysis. Ideally, I'd like to specify a
> constant number of features (say 1000) to keep for each fold.
How do you know that you need 1000?
In any case, Michael would correct me if I am wrong, by now we didn't
yet have a Classifier which would do some feature selection, ie you had
to implement loop through the splits manually and run RFE
FeatureSelection (using some SensitivityAnalyzer such as OnewayAnova or
LinearSVMWeights if you use SVM) on each split manually.
I am going to hack a little wrapper called
FeatureSelectionClassifier which would probably make use of already
existing MappedClassifier parametrized with MaskMapper using mask for
the features given by FeatureSelection algorithm). We just need a good
example afterwards for easier digestion
> 2) I'd like to keep the classifier predictions and values for each
> test sample of each fold. This, too, for later inspection.
enable states "predictions" and "values" for the classifier at hands and
access them later on after it got trained.
> 3) I'd like to know what is going on a little better. How do I turn
> up a higher level of verbosity so that, for example, it tells me which
> fold it's currently on in the crossvalidation or which sphere it's on
> in the searchlight?
Now we have 3 types of messages (I guess I should have placed such
description in the manual as well... at least as the starting point may
be... Michael - I will do that)
verbose : generic messages which are for user to use and there are just
few messages spit out in verbose by mvpa itself -- just
informative messages
verbose messages are printed out if their level
(hardcoded) is less than given in command line (precrafted
optVerbose to give facility to change it from your script, see
examples) or environment variable MVPA_VERBOSE
warning : messages which are reported by mvpa if something goes a little
unexpected but not critical. They are printed just once per
occasion, ie if that piece of code hit again - message is not printed
out again
debug : that is what used to track progress of any computation inside
mvpa. They are given not by the level but by some ID. Have a
look at mvpa/misc/__init__.py or just print
debug.registered to see what IDs can be used and what they
abbreviate for. Then if you want
to print all debugging messages just do one of the following
MVPA_DEBUG=all # in shell environment
debug.active += [ 'all' ]
or add a list of desired IDs known (ie registered)
Also there is few metrics registered for debug (vmem, asctime)
which can be enabled to be printed at each debug statement
output of debug is indented by the level of call (according to
backtrace)
NOTE: debug messages are printed only in non-optimized code. ie if
you run mvpa code with python -O, none should be ever
called/printed. That was done to eliminate any slowdown introduced
by such 'debugging' output, which might appear at some
computational bottleneck places in the code. But after we used it
for a while, I think that may be we should enable some debug
output at some levels where it is known that it is now a
really intensive computation
TODO: Unify loggers behind verbose and debug. imho debug should have
also way to specify the level for the message so we could provide
more debugging information if desired.
Hope this helps ;-)
> 4) I'm training on subsets of a recall period, but it would be great
> to test on every sample from the left-out chunk, returning the
> predictions and values for each sample.
I am not clear what you mean... can you provide an examples listing
which samples you train on, which you transfer to?
--
Yaroslav Halchenko
Research Assistant, Psychology Department, Rutgers-Newark
Student Ph.D. @ CS Dept. NJIT
Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
WWW: http://www.linkedin.com/in/yarik
More information about the Pkg-exppsy-maintainers
mailing list