[Pkg-exppsy-pynifti] [pynifti] NiftiDatatset bug?

Tue Jan 13 18:14:38 UTC 2009

Michael Hanke wrote:
> Hi Scott,
>
> I hope I got the problem right (now). Let me reply here since I think
> all stuff below is heavily inter-dependent.
>
> Let me try to summarize: I think the real issue you are talking about is
> the one about nifti files which should have the same properties (e.g.
> array shape), but haven't. Now, you'd like pynifti to be able to deal
> with such cases.
>
>   
yes, also when a single image simply doesn't have the shape we expect
> Now my laziness comes into play. I still think this is a case that
> should not happen. 
ahh, don't say that!!
> In case of the beta image example you typically
> either have 3d volumes (one set of betas per file) or a single file for
> all beta sets in a 4d array. It the former case you can simply do:
>
>   betas1 = numpy.asarray([NiftiImage(filename).data for filename in file_list])
>
> ('load=True' is not required since the data will be automatically loaded
>  when accessed)
>
> which will result in a 4d image. In the latter case you just load the 4d
> image directly and have a second set, e.g. betas2. If you now want to
> merge them, you can do:
>
>   merged = numpy.concatenate((betas1, betas2)
>
> which will yield a new 4d image with the volumes from but sources.
>
>   
yes, but again, this forces us to be careful about the way niftiimage is 
handled - it requires two different syntaxes for two different 
assumptions about the dimensionality of the data. For instance, if you 
only calculated 1 beta in the betas2 file, for whatever reason, it could 
be loaded as 3d (depending on how the author of the code that wrote that 
nii handles single timepoints) and this code will fail unless another 
workaround is added.
> The other use-case you are proposing is more complicated. Basically, you
> say that you get _any_ array. You have no clue not even basic
> assumptions what it contains, but you want to come up with some generic
> code to deal with any image, right?
>
>   
yes - i believe niftiimage should be able to handle all cases elegantly. 
(we still assume constant dim order, just not the total number of dims)
> In that case, you can always extend to data array to the maximum number
> of dimensions to be able to treat them all in a uniform fashion. But
> again, this is done with numpy itself, not pynifti:
>
>   >>> nim=NiftiImage('example4d.nii.gz')
>   >>> nim.data.shape
>   (2, 24, 96, 128)
>
>   >>> N.array(nim.data, ndmin=7).shape
>   (1, 1, 1, 2, 24, 96, 128)
>
>
> This is BTW also an example of why pynifti reverses the order of the
> axis.
>
> Anyway, in both examples NumPy provides simple ways to deal with
> heterogeneous array shapes. Moreover, I believe that pynifti should
> simply offer whatever a file contains -- and nothing more. If it is a 3d
> volume, it should expose a 3d array. If it is a 4d image with just one
> volume -- it should make a mental note about this strange image, but
> then return a 4d array.
>
>   
ah, i actually hadn't come across ndmin before, this is useful. of 
course it's done in numpy - but why not let the niftiimage call numpy on 
itself, when necessary?
> Even in the case where you have to deal with really strange images that
> change t meaning of the axis. Numpy offers the `swapaxes` function to
> fix it.
>   
no, with the exception of glaring read/write errors, i think it's safe 
to assume the axes are never swapped
> With respect to a prospective NiftiImage method to modify the array
> shape, I'd be glad to add it if it provides something that cannot be
> done in two lines with plain numpy -- but ATM I don't see such use-case.
>
>
> Is this somewhat helpful or even touches to core of your problem?
>
>   
yes, now we're talking! but let me stress that all the examples i've 
given *are* use-cases that i've come across in my own programs, where 
niftiimage doesn't handle ambiguity in the array shape gracefully. Sure, 
at any given line, two more lines of numpy code will solve the problem - 
but this forces every function to solve the problem! Every time I load 
niftidata, I must ensure it's the shape I expect for that function and 
usage. So two lines of code becomes ... well ... proportional to your 
use of niftiimage. The bigger issue is that everyone will occasionally 
forget to write those two lines of code, and newbies won't know they 
need to. To me this screams for wrapping those two lines into the class 
def itself.

i do agree that NiftiImage should be treated as a data object and expose 
just what's necessary for working with the data. But for me, necessary 
tools include those for concatenating/indexing data in a consistent 
manner, especially since nifti is designed for a large variety of 
dimensionality, including multiple samples (not to mention mapping 
between dimensions, like 3d ROI's across time). So, for example, let me 
explicitly (re)propose the following set of tools which, IMHO, would 
enable me to treat a NiftiImage as a proper data object without making 
assumptions about its shape:

* concatenation which automagically upcasts to the minimum 
dimensionality required, and no more. So, eg, I could call nim = 
nim.extend(NiftiImage(f) for f in file_list, dim='t') and never have to 
worry about what the data arrays look like

* indexing which both upcasts or downcasts to the dimensionality 
required, perhaps dependent on a kwarg, and guarantees you index into 
the dimension you intend. For instance, nim.data[samples] only works 
correctly if nim is 4d (otherwise it samples Z), and it will return 3d 
if samples is a number and 4d if samples is an array. We've already seen 
this bug in pymvpa... Ideally there should be something like 
nim.get_timeseries(times=':')->[t x Z x Y x X] or one of the other 
syntaxes i've proposed.

* perhaps a volume iterator across all non-space dims, or a single given 
non-space dim (default time) which always yields a 3d vol regardless of 
the original dimensionality. like, for vol in nim.timepoints():

* on save, an option to force removal of singleton dimensions and/or 
explicitly set dim[0]. I believe some of the freesurfer volume tools 
actually throw an error if dim[0] = 4 even if there's only one 
timepoint, and this may be true with other packages. again, this can be 
done manually, but it would be simpler in the class def. This also will 
add a note about this type of compatibility issue into the docstring for 
the save method, which could be *really* helpful. I seem to recall this 
particular nasty requiring an hour of debugging once... Alternatively, 
just a general method to reshape the niftiimage would suffice.

* (unrelated but fun) perhaps overload basic arithmetic operations that 
just expose the data array. for instance, often i need an example 
functional image from a timeseries. there are many ways to do this, but 
a really nice one would be NiftiImage(myfile).mean().save('exf.nii'). 
Also a nice way to mask an image would be (NiftiImage(myfile) * 
NiftiImage(mymask)).save(output). These are just conveniences though...

plus, if these types of things are implemented in the class, the header 
can be updated as necessary when the data shape changes. i believe (not 
double checking ;)) this is currently possible only if the data array is 
extracted, manipulated, then passed into a new niftiimage. so, again, 
not a big deal, but doing it this way feels more elegant to me, and is a 
bit more convenient and object-oriented.

i'll agree/cede with you that there is no need to force the data array 
to a given shape in general, and 'volumes' can just be stored as (z,y,x).

I guess that at the heart of it, due to how the data array is so 
malleable, I feel like it should be privately handled to the fullest 
extent possible... nifti really feels more like a data *container* to me 
anyway, given that it can take so many dims. i know nothing i'm 
proposing will require more than a few lines per method implementation, 
but I strongly feel that it makes for a more elegant, robust, class 
design, and yet the full data array will always be exposed for those who 
wish to use it manually.

oh btw, i'm not sure how someone leading two open source projects (that 
I know of) gets away with calling himself lazy ;)

given how vocal we both are about what amounts to a few lines of code, 
i'm surprised no one else has chimed in... are there other thoughts 
floating around? anyone else vehemently (dis)agree with me?

-s