[Pkg-exppsy-pynifti] [Nipy-devel] Image, design, usecases

Thu May 14 00:14:09 UTC 2009

Hi again,

On May 13, 2009, at 6:43 PM, Matthew Brett wrote:
>
> Sorry - by the current reader - I meant the version that's sometimes
> called 'brifti' or 'volumeimages' in the pynifti repository...
>
Where is that?  I don't see anything with that name here: http://git.debian.org/?p=pkg-exppsy/pynifti.git;a=tree;f=nifti;h=41f14be3faf382124ea616b8868d93af8e6f70fc;hb=HEAD

>> I create a header, fill it, and
>> write it to disk, and then I have a for loop where I grab raw data,
>> reconstruct a volume and then write that volume to disk.  This way  
>> of doing
>> it:
>>
>>> img = Nifti1Image(data, affine)
>>> img.to_filespec('somefile.nii')
>>
>>
>> requires that I compute all of the data first.  Perhaps I am being  
>> clueless.
>
> No, that's an excellent point.  I don't know how the C libraries do
> this.

This is why I had mentioned low level file operations like seeking in  
my previous message.

>  The thing that makes this hard is that, if you have an 'int'
> data type, and data scaling - as is typical in NIFTI - then you may
> well find yourself writing data to disk that requires different
> scaling from the first few volumes, and then you have to go back over
> the whole dataset to redo it.  If you have float or complex images,
> this isn't a problem.
>
> As a matter of interest - are you writing to Int type images?
>

This is one of the reasons that I never use int.  In fact, I think  
that both int and compressed are totally stupid ways to store data.   
For the record, I tend to have very mild opinions about things, just  
ask Dav :-)

BTW - Here's my thinking on numerical precision.   Scanners store data  
in single precision complex.  This is sufficient to represent analog  
signals which have bit digitized by a 16-bit digitizer (which gives  
you steps of about 3E-05).  The intrinsic MR signal to noise ratio is  
something almost always less than 1000, and definitely less than  
10000, so I can safely say that the raw data from the device is good  
to 3 or maybe 4 digits.  If I save my intermediate results in single  
precision floating point (6 digits) and do all my calculations in  
double precision with algorithms that are accurate to 10 digits, I'm  
guaranteed to have done the best that I could have.   For a much nicer  
description of this point of view, see Lloyd Trefethen's manifesto: http://www.comlab.ox.ac.uk/nick.trefethen/ten_digit_algs.htm

> But, thanks, yes, this is a use-case we should think about.
>

The above is really only the most trivial of use cases.  One of the  
main reasons that I am excited about moving all of my matlab code to  
python is to take advantage of numpy's very good handling of very  
large data sets and cluster/multi-core machines.  Here's the current  
architecture of my code:
1) go through a giant (>10GB) raw data file and cut it up into little  
pieces corresponding to 1-slice (or 1 volume) worth of raw data
2) send each piece to a cluster node where it is turned into an image  
by launching matlab, running a series of calculations on the data.
3) stitch all the images back together
This is horrendous.  The math part of the code is quite compact and  
readable.  The rest of it is glue to submit things to PBS, blah, blah  
blah.  Oh yeah, and I have to have as many copies of matlab running as  
I do compute nodes (96 in my case).  Ugh.  Anyway, I'm hoping to use  
ipcluster to make things suck less.

I'm not sure how this turns into a usecase, but basically it would be  
good to be able to do asynchronous writes to parts of the file.  One  
way is to initialize a file that is big enough to store the result of  
the calculation and have all of the threads use shared memory to  
access the piece of it that they need.  Or something nicer.

-Souheil