From: Jim Weigang <jimw@chilton.com>
To: Leonard Howell <leonard.howell@msfc.nasa.gov>
Newsgroups: aus.mathematics,comp.lang.apl,comp.lang.fortran,
comp.programming.contests,microsoft.public.vb.visual_modeler,sci.stat.math
Date: Sun, 24 May 1998 10:10:07 -0400
Subject: Re: Interesting Programming Problem, Need Some Help
Leonard Howell wrote on May 21:
> Here is a interesting problem that perhaps someone has solved before
> Hor has an idea of how to program the solution. Given a square array
> H(NxN) filled with zeros and ones (in practice, there are many more
> Hzeros than ones), what is the distribution of patterns of size 1, 2,
> H3, ...., where a pattern is defined any string of adjacent 1's,
> Heither horizontally or /and vertically distributed, and the number
> Hof "singlets" is also of interest To simply the problem, I'm
> Hignoring diagonal connections. For example, in the following array,
> Hthere are: 5 ones, 1 two, 1 three, 2 fours, and 1 six.
This is a variation/subset of the "blob finder" problem. Once you know
the frequency of different size blobs, it seems likely that a followup
request will be to identify the individual blobs in the matrix.
Here's my algorithm for finding blobs:
* Visit each nonzero cell (top to bottom, left to right)
* If the cell is 1, assign it a new blob index (>= 2)
* Examine the following neighbors of the cell:
. . .
. X n
n n n
* If a neighbor is 1, replace it with X's blob index
* If a neighbor is not 1 or 0, remember that blob n is
equivalent to blob X
* After visiting all the cells, make another pass through the
matrix, recoding the initial blob indices to merge equivalent
blobs and use sequential indices
The blob population counts can be accumulated by bumping a counter
whenever you change a 1 to a blob index.
This algorithm has enough inherent iterativeness that I didn't spend
much time trying to "parallelize" it for fast APL execution. A
straightforward iterative coding (for the APL+ dialect) is given below.
It would make an interesting test for Bob Bernecky's APEX APL compiler,
or it could be translated to a conventional compiled language. If it's
really going to be run using an interpreter, further improvements could
be made. (In particular, the :For K loop could be parallelized easily
if Dyalog APL's P[i]{<-}.+1 feature were available in APL+.)
As written, the program follows links along diagonal lines in addition
to vertical and horizontal lines. Line [16] can be modified if you
want to ignore diagonal neighbors. If you don't want the blob indices
and can't afford the space taken by the integer version of the matrix,
the code could be modified to work with just two rows of the Boolean
matrix at a time (as Bill Huber suggested).
Here are some examples:
B{<-}1=?10 15{rho}4
B
0 0 0 0 0 0 1 0 0 0 0 1 0 0 0
0 0 0 0 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 0 0 0 1 1 0 0 0 0 0 1 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 1 0 0 1 0 1
0 0 0 0 1 0 0 1 0 1 0 0 0 0 0
0 1 0 0 1 1 0 0 0 0 1 1 1 0 1
0 0 1 0 0 0 1 1 1 1 0 0 0 0 0
0 1 1 0 0 0 1 0 0 0 0 0 0 0 0
DISP BLOBS B
. . . . . . B . . . . A . . . 3 1 <- three singletons
. . . . B B B . . . . . . . . 2 2 <- two doubles
. . . . . . . . . . . . . . . 2 4
C C . . . D D . . . . . E . E 1 5
. . . . . . . . . . . . . E . 1 13 <- one 13-item blob
. . . . . . . . . F . . E . E
. . . . F . . G . F . . . . .
. H . . F F . . . . F F F . I
. . H . . . F F F F . . . . .
. H H . . . F . . . . . . . .
B8
0 0 0 0 0 0 1 0 Leonard's example
1 0 0 1 1 0 1 1
0 1 1 0 1 1 0 0
1 1 1 0 0 0 1 0
0 0 1 0 0 0 0 1
0 1 0 0 0 0 0 1
0 0 1 1 1 0 0 0
1 0 1 0 0 0 0 1
DISP BLOBS1 B8 BLOBS1 ignores diagonal neighbors
. . . . . . A . 5 1
B . . C C . A A 1 2
. D D . C C . . 1 3
D D D . . . E . 2 4
. . D . . . . F 1 6
. G . . . . . F
. . H H H . . .
I . H . . . . J
And here's the code:
Jim
|
|
|
Leonard wrote back and described what the program would be used for:
|
|
For our application, the elements of the array can be
thought of as pixels of an imaging system looking down on the Earth, and the
connected "1"s are flashes caused by cosmic rays interacting with the
atmosphere. We will be interested in making inferences regarding the longer
tracks and bigger blobs as they are associated with the more energetic
nuclei that have greater scientific interest. We will also want to
investigate the inherent noise of the detector and the statistics of signal
and noise, etc. BTW, to simplify the problem, I had originally stated that
one could look at only the adjacent 1's that are beside, above, and below,
but as it turns out the diagonal adjacents will need to be included too.
[The program] will
most likely be an important part of a large Monte Carlo simulation of the
performance characteristics of the Orbiting array of Wide-angle Light
collectors (OWL): a Pair of Earth Orbiting "Eyes" to Study Air Showers
Initiated by 10^20 Electron Volt Quanta. The matrix cells can be thought of
as pixels of the detector and the blobs will be flashes caused by incident
cosmic rays and photon noise in the atmosphere (e.g., star light
reflections). We will probably want to add other features for
discriminating the blobs, e.g., look for straight line patterns, at a later
date. We hope to first fly a proof of concept instrument on a high altitude
balloon with something like a 50 by 50 pixel detector (that should run very
fast!), and then later a spaceborne detector on a satellite with a 1000 by
1000 pixel imager.
We will likely have a
homepage for this project before too long and I'll post things of interest
there too. (You can take a look at one of our other project homepages at
http://science.msfc.nasa.gov/cosmicray/ica/default.htm).
|
He also reported a couple of bugs that have been fixed in the listing above.
In the process of fixing the bugs I discovered something I hadn't known
about the APL+ interpreter, as described in the following message:
|
From: Jim Weigang <jimw@chilton.com>
To: Leonard Howell <Leonard.Howell@msfc.nasa.gov>
Date: Wed, 27 May 998 00:45:14 -0400
Subject: Re: Interesting Programming Problem, Need Some Help
Thanks for the bug reports and further information. (I'm intrigued to
learn what you're going to use the algorithm for.) The error you got in
DISP was because I'm using the old Evlevel=1 setting, and you're
probably using Evlevel=2 (APL2 compatibility). Don't change your
Evlevel setting; your fix to DISP was the correct solution. The bug in
BLOBS was in the line that marked two blobs as equivalent. I should
have written R[U] instead of U on line [45].
I tried some timings, first on 200x200 matrices, which went pretty
fast (a couple of seconds), and then on a 1000x1000. As I waited, I
figured it ought to take about a minute. Many minutes passed without it
finishing, and when I hit break it wasn't 1/3 of the way done! At
first I thought line [45] was the culprit. R at this point was about
40,000 elements long, and this line is using a crummy N-squared
algorithm to do something that could be done in linear time (with
greater thought and coding effort). But when I repeated the run with
#MF monitoring on, I found to my surprise that almost all of the time
was being spent on line [42], which is just Z[T]{<-}U. What in the
world???
After some investigation, I found that APL+ has a very poor
implementation of Z[{enclose}I J]{<-}U: it runs about ~150x slower than
Z[I;J]{<-}U for a 1000x1000 Z. Judging by the execution time, it seems
to be implemented as (({enclose}I J){pick}Z){<-}U, and selective
assignment is very wasteful when storing items in a large array because
it creates an array of indices (the same size as the target array) and
applies the selection expression to the indices. [For more information,
see the article "Beware Selective Assignment".]
After eliminating the nested indices, the program ran MUCH faster.
Examining the LTIMES #MF report, I found that a lot of time was being
spent in the :Continue case of the For Each Neighbor loop, so I revised
the loop to skip these cases. This resulted in about a 1.4x speedup. I
ran into some WS FULLs near the end, so I added a few reassignments to Z
to prevent these. With these changes, the program will do a 1000x1000
matrix with 10% ones in under 23 seconds with a 12M workspace size.
Here are some execution times (in seconds) on a 300 MHz Pentium II,
using APL+Win version 2:
Matrix Size
1s |
Density | 200 1000
---------+-----------------
|
10% | .57 22.8 40x
|
5% | .28 8.0 29x
|
2% | .13 3.4 26x
The ratios on the right are the 1000 execution time divided by the 200
execution time. The 1000 case is intrinsically 25x more work than the
200 case, and that's about what's observed for low-density cases. I'd
guess that the 40x ratio on higher densities is from the crummy
algorithm on line [45].
Even with a 10% 1000x1000 matrix, I think you could do a thousand
Monte-Carol runs overnight (~6.4 hrs). Because this program is
scalar-iterative, I believe it would run about ~100-200x faster in a
compiled language. (And it wouldn't be that much work to write an
assembler version you could use in APL+Win.)
The corrected and sped-up version of the program is appended below.
[In this web page, the changes are incorporated in the listing above.]
Neat problem!
Jim
Note: The APL code on this page can be extracted without you
having to retype it.
See "Downloading Code" for instructions.
JimW's Home Page
|