[Pkg-exppsy-maintainers] Q: Crossvalidation feature selection

Thu Dec 20 15:00:54 UTC 2007

Let me try to give brief answers ;-)

> Wow!  I'm totally amazed with the amazing work you have done on
> pymvpa.  I just started using it for real tonight and I'm quite happy.
That is great ;-)

> 1) I'd like to run feature selection (I'm happy to use anything in
> there, such as the ANOVA or something fancier) on each training set of
> a N-Fold cross validation run.  I'd also like to save the mask of
> those features for later analysis.  Ideally, I'd like to specify a
> constant number of features (say 1000) to keep for each fold.
How do you know that you need 1000?

In any case, Michael  would correct me if I am wrong, by now we didn't
yet have a Classifier which would do some feature selection, ie you had
to implement loop through the splits manually and run RFE
FeatureSelection (using some SensitivityAnalyzer such as OnewayAnova or
LinearSVMWeights if you use SVM) on each split manually.

I am going to hack a little wrapper called
FeatureSelectionClassifier which would probably make use of already
existing MappedClassifier parametrized with MaskMapper using mask for
the features given by FeatureSelection algorithm). We just need a good
example afterwards for easier digestion

> 2) I'd like to keep the classifier predictions and values for each
> test sample of each fold.  This, too, for later inspection.
enable states "predictions" and "values" for the classifier at hands and
access them later on after it got trained.

> 3) I'd like to know what is going on a little better.  How do I turn
> up a higher level of verbosity so that, for example, it tells me which
> fold it's currently on in the crossvalidation or which sphere it's on
> in the searchlight?
Now we have 3 types of messages (I guess I should have placed such
description in the manual as well... at least as the starting point may
be... Michael - I will do that)

verbose : generic messages which are for user to use and there are just
          few messages spit out in verbose by mvpa itself -- just
		  informative messages

		  verbose messages are printed out if their level
		  (hardcoded) is less than given in command line (precrafted
		  optVerbose to give facility to change it from your script, see
		  examples) or environment variable MVPA_VERBOSE

warning : messages which are reported by mvpa if something goes a little
          unexpected but not critical. They are printed just once per
          occasion, ie if that piece of code hit again - message is not printed
          out again

debug : that is what used to track progress of any computation inside
        mvpa. They are given not by the level but by some ID. Have a
		look at mvpa/misc/__init__.py or just print
		debug.registered to see what IDs can be used and what they
		abbreviate for. Then if you want
		to print all debugging messages just do one of the following

		MVPA_DEBUG=all # in shell environment
		debug.active += [ 'all' ]

		or add a list of desired IDs known (ie registered)

      Also there is few metrics registered for debug (vmem, asctime)
	  which can be enabled to be printed at each debug statement

	  output of debug is indented by the level of call (according to
	  backtrace)

      NOTE: debug messages are printed only in non-optimized code. ie if
	  you run mvpa code with python -O, none should be ever
	  called/printed. That was done to eliminate any slowdown introduced
	  by such 'debugging' output, which might appear at some
	  computational bottleneck places in the code. But after we used it
	  for a while, I think that may be we should enable some debug
	  output at some levels where it is known that it is now a
	  really intensive computation

	TODO: Unify loggers behind verbose and debug. imho debug should have
	also way to specify the level for the message so we could provide
	more debugging information if desired.

   Hope this helps ;-)

> 4) I'm training on subsets of a recall period, but it would be great
> to test on every sample from the left-out chunk, returning the
> predictions and values for each sample.
I am not clear what you mean... can you provide an examples listing
which samples you train on, which you transfer to?

-- 
Yaroslav Halchenko
Research Assistant, Psychology Department, Rutgers-Newark
Student  Ph.D. @ CS Dept. NJIT
Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
        101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
WWW:     http://www.linkedin.com/in/yarik