[Pkg-exppsy-pynifti] [Nipy-devel] Image, design, usecases
Souheil Inati
souheil.inati at nyu.edu
Thu May 14 00:14:09 UTC 2009
Hi again,
On May 13, 2009, at 6:43 PM, Matthew Brett wrote:
>
> Sorry - by the current reader - I meant the version that's sometimes
> called 'brifti' or 'volumeimages' in the pynifti repository...
>
Where is that? I don't see anything with that name here: http://git.debian.org/?p=pkg-exppsy/pynifti.git;a=tree;f=nifti;h=41f14be3faf382124ea616b8868d93af8e6f70fc;hb=HEAD
>> I create a header, fill it, and
>> write it to disk, and then I have a for loop where I grab raw data,
>> reconstruct a volume and then write that volume to disk. This way
>> of doing
>> it:
>>
>>> img = Nifti1Image(data, affine)
>>> img.to_filespec('somefile.nii')
>>
>>
>> requires that I compute all of the data first. Perhaps I am being
>> clueless.
>
> No, that's an excellent point. I don't know how the C libraries do
> this.
This is why I had mentioned low level file operations like seeking in
my previous message.
> The thing that makes this hard is that, if you have an 'int'
> data type, and data scaling - as is typical in NIFTI - then you may
> well find yourself writing data to disk that requires different
> scaling from the first few volumes, and then you have to go back over
> the whole dataset to redo it. If you have float or complex images,
> this isn't a problem.
>
> As a matter of interest - are you writing to Int type images?
>
This is one of the reasons that I never use int. In fact, I think
that both int and compressed are totally stupid ways to store data.
For the record, I tend to have very mild opinions about things, just
ask Dav :-)
BTW - Here's my thinking on numerical precision. Scanners store data
in single precision complex. This is sufficient to represent analog
signals which have bit digitized by a 16-bit digitizer (which gives
you steps of about 3E-05). The intrinsic MR signal to noise ratio is
something almost always less than 1000, and definitely less than
10000, so I can safely say that the raw data from the device is good
to 3 or maybe 4 digits. If I save my intermediate results in single
precision floating point (6 digits) and do all my calculations in
double precision with algorithms that are accurate to 10 digits, I'm
guaranteed to have done the best that I could have. For a much nicer
description of this point of view, see Lloyd Trefethen's manifesto: http://www.comlab.ox.ac.uk/nick.trefethen/ten_digit_algs.htm
> But, thanks, yes, this is a use-case we should think about.
>
The above is really only the most trivial of use cases. One of the
main reasons that I am excited about moving all of my matlab code to
python is to take advantage of numpy's very good handling of very
large data sets and cluster/multi-core machines. Here's the current
architecture of my code:
1) go through a giant (>10GB) raw data file and cut it up into little
pieces corresponding to 1-slice (or 1 volume) worth of raw data
2) send each piece to a cluster node where it is turned into an image
by launching matlab, running a series of calculations on the data.
3) stitch all the images back together
This is horrendous. The math part of the code is quite compact and
readable. The rest of it is glue to submit things to PBS, blah, blah
blah. Oh yeah, and I have to have as many copies of matlab running as
I do compute nodes (96 in my case). Ugh. Anyway, I'm hoping to use
ipcluster to make things suck less.
I'm not sure how this turns into a usecase, but basically it would be
good to be able to do asynchronous writes to parts of the file. One
way is to initialize a file that is big enough to store the result of
the calculation and have all of the threads use shared memory to
access the piece of it that they need. Or something nicer.
-Souheil
More information about the Pkg-exppsy-pynifti
mailing list