[Shootout-list] New matrix-norm not matrixy?

Dave davejf@frontiernet.net
Sun, 27 Mar 2005 21:14:32 -0600


Good points, especially

> One of the advantages of the proposed matrix norm calculation is that
> it is resistant to LAPACK acceleration. So you'll have to get your
> hands dirty if you want to get some performance.

given the recent discussion on testing compilers and runtimes, not 3rd party 
libs.

Another great point is using readily available solutions that are optimized 
to avoid memory latency, especially when the goal is to test compilers and 
runtimes - not hardware.

- Dave

----- Original Message ----- 
From: "Sebastien Loisel" <sloisel@gmail.com>
To: <sleepingsquirrel@member.fsf.org>
Cc: "Dave" <davejf@frontiernet.net>; <shootout-list@lists.alioth.debian.org>
Sent: Sunday, March 27, 2005 2:57 PM
Subject: Re: [Shootout-list] New matrix-norm not matrixy?


>>     I've taken a quick look at the matrix-norm test and it seems like in
>> the process of converting from the mathematical description (item #3 at
>> http://mathworld.wolfram.com/Hundred-DollarHundred-DigitChallengeProblems.html
>> ) to an imperative algorithm, the test has lost some of its matrixyness.
>> Note that the original 2-D matrix has been replaced by the "eval_A"
>
> Calculating the matrix norm of the infinite dimensional operator by
> instantiating the actual matrix is not practical when you want to get
> 10 digits of precision, which is how this algorithm came to be. You
> can, in fact, calculate the norm of the infinite dimensional matrix
> from the norm of smaller restrictions of it by using a sort of Aitken
> acceleration, but that is an entirely different algorithm. That is how
> our team confirmed that we had ten digits (by using two completely
> different methods of calculation.)
>
> As you point out, there is no two-dimensional array in the program.
>
>> function.  Also, the indexing on the the 1-D vectors (u & v, which are
>> use to accumulate the intermediate values) is very simple.  It marches
>> right on through, from one element to the very next.  We might want to
>
> and everything fits in cache, so even if you did access randomly it
> wouldn't change much.
>
>> consider a matrix test which has higher dimensions (2D, 3D, ...) and a
>
> which is a different problem -- you can't use the "power method" to
> compute the "operator norm" of a 3rd order tensor. These words aren't
> even well defined for third order tensors.
>
>> more complex access pattern.  For instance, maybe having the next element
>
> The first step in optimizing numerical code is to take a code with a
> "complex access pattern" and making it less complex, which is the
> BLAS/LAPACK success story.
>
>> of matrix depend on its nearest neighbors in both columns and rows (e.g.
>> B[i,j] = A[i-1,j] + A[i,j-1] - A[i+1,j+1] ...).  As an example, see the
>> fluid dynamics simulation shown on...
>
> One of my next benchmarks will be either a very high dimensional ODE
> solver (which will involve at least a dense LU or gaussian
> elimination) or a PDE solver (which will probably have a sparse matrix
> solver.)
>
> If you write a test that uses A[i-1][j] and A[i][j+1] and so on to
> thrash the cache, you'll be testing memory latency. Since this is an
> extremely well studied problem, we know what the solutions are. You
> have to rewrite the matrix A in block forms (such that A[i][j] is not
> a double, but rather a small matrix) so that all operations are now
> performed on blocks, and the three blocks you will need at any one
> time will fit in the cache of the CPU. A compiler stands absolutely
> zero chance of figuring this out since this isn't actually
> semantically the same as the straightforward algorithm once you take
> into account the details of floating point calculations and since the
> layout in memory is completely different.
>
> Furthermore, if you do write such a test-case in C, the guy who writes
> the numerical python one will beat you. Does that mean that numerical
> python is faster than C? No, numerical python is built on top of
> LAPACK, a fortran package. If you also write your C program to use
> LAPACK, you'll perform the same as numerical python. But we don't care
> about all this -- there are many LAPACK benchmarks and we already know
> that LAPACK is fast.
>
> One of the advantages of the proposed matrix norm calculation is that
> it is resistant to LAPACK acceleration. So you'll have to get your
> hands dirty if you want to get some performance.
>
> I am currently working on several more interesting benchmarks. I've
> circulated them on the mailing list
> (http://lists.alioth.debian.org/pipermail/shootout-list/2005-March/001272.html,
> http://www.math.mcgill.ca/loisel/numerical-integration.html, and
> http://lists.alioth.debian.org/pipermail/shootout-list/2005-March/001275.html.)
> The third one is very complex and uses several nontrivial algorithms
> and looks scary to translate in many languages, however it will still
> run entirely in the cache. Once I have written that one in several
> languages, I will propose it and will begin work on another benchmark
> with actual matrices in it.
>
> Cheers,
>
> Sebastien Loisel
>
>
>
> -- 
> No virus found in this incoming message.
> Checked by AVG Anti-Virus.
> Version: 7.0.308 / Virus Database: 266.8.3 - Release Date: 3/25/2005
>
>