The Swiss Cheese Algorithm
- Author:
- Serge Monkewitz
Contents | |
Definitions and Terms
- D1 : Apparition
Let
be the complete set of observed apparitions
. The total number of apparitions
is equal to
.
- D2 : Position
Each apparition
has a position on the sky
, specified as a unit vector in
(that is,
).
- D3 : Distance
Define the distance between two positions
to be
- D4 : Match set
Given
, define the radius-R match set for
as follows :
- D5 : Apparition (group) centroid
Each apparition
has a centroid
which is defined as follows :
where
will be referred to as the density radius.
- D6 : Apparition density
Each apparition
has an integer density
:
- D7 : Group
A group is a set of nearby apparitions likely to be obvservations of the same astronomical source. Define the group
to be the set of apparitions within the group radius
of the apparition centroid
:
Note that the group radius
should be set so that if
for two apparitions
and
, then they are very unlikely to be observations of the same astronomical source.
- D8 : Confused apparition
A confused apparition is an apparition that belongs to more than one group.
- D9 : Confused group
A confused group is a group that contains at least one confused apparition.
- D10 : Group count
Each apparition
has a group count
which is equal to the number of groups
has been assigned to. Note that:
-
is confused
-
is unconfused
-
does not belong to any groups
.
-
- D11 : Group type
Each group
has a group type
which takes the following values :
-
: signifies that
is unconfused, and was generated by a swiss cheese pass.
-
: signifies that
is unconfused, and was generated by a cheese crumbling pass.
-
: signifies that
is confused, and was generated by a swiss cheese pass.
-
: signifies that
is confused, and was generated by a cheese crumbling pass.
-
- D12 : Seed set
The seed set is the set of apparitions which are to be considered as the starting points for generating groups.
- D13 : Seed flag
Each apparition
has a seed flag
which takes the following values :
-
: signifies that the apparition has been removed from the set of seeds by a cheese crumbling pass.
-
: signifies that the apparition is not in the set of seeds.
-
: signifies that the apparition is to be considered as the starting point for a group by swiss cheese passes.
-
: signifies that the apparition is to be considered as the starting point for a group by cheese crumbling passes.
-
- D14 : Group counter
Each generated group has an integer group counter that uniquely identifies it within the set of all generated groups.
- D15 : Apparition counter
Each apparition
has an integer apparition counter
which uniquely identifies
in
.
- D16 : Apparition comparison
The operators
,
,
for apparitions are defined as follows :
-
-
-
either
or
-
- D17 : Apparition scan key
Each apparition optionally has a scan key which identifies the scan it was extracted from.
- D18 : Scan count
Each group has a scan count, equal to the number of scans containing at least one apparition belonging to the group.
- D19 : Apparition count
Each group has an apparition count, equal to the number of apparitions belonging to the group.
Algorithm Input and Output
The WAX application currently retrieves apparitions from a single database table, and stores results in up to 3 database tables. The DBMS hosting the input and output tables is required to support ESQL/C (WAX is currently capable of communicating only with the Informix RDBMS).Input Apparition Table
WAX assumes that the apparition table contains columns corresponding to a 32-bit integer unique identifier (see D16), as well as J2000 right ascension and declination (handled internally as double precision floating point values). If coverage computation is desired, WAX can optionally retrieve scan keys which correspond to 32-bit integer unique identifiers for the scans apparitions were extracted from. If left unspecified, these columns are assumed to be namedcntr
, ra
, dec
, and scan_key
. Additional retrieval columns can be specified at compile time - for more information concerning this feature, please see the Writing WAX Plug-Ins documentation page. Group Information Table
This output table contains group attributes. The following columns will be present for each generated group :-
gcntr
: the group counter, a unique identifier for the group -
napp
: the group apparition count -
gtype
: the group type -
sdet
(optional) : the group scan count
Group-apparition Link Table
This output table contains links between groups and apparitions. The following columns will be present for each distinct group-apparition pair :-
gcntr
: the group counter, a unique identifier for the group -
cntr
: the apparition counter, a unique identifier for the apparition -
ngrp
: the group count, the number of groups the apparition belongs to
Singleton Table
Normally, single apparition groups (singletons) are stored in the group information table. However, if the end-user of the WAX application so desires, WAX can store singletons in a seperate table.
This output table will normally contain a single column named gcntr
, a unique identifier for the single apparition group (see D14). This is the only column necessary since all singletons have an apparition count and scan count equal to 1, and a group type of 0. Note that when a singleton table is in use, singletons are not recorded in the link table since singletons always have group counters equal to their apparition counters.
Finally, there is a compile time option to include all columns retrieved from the input table in the singleton table (just as for the group-apparition link table). For more details, see the Usage Examples documentation section of the srcgen (Source Code Generator) helper application.
Grouped Apparition Table
This optional table contains information pertaining to grouped apparitions, calculated during the output pass of the swiss cheese algorithm, and allows (via the plug-in interface) a "best" group to be picked for a given apparition.
This output table will normally contain a single column named cntr
, a unique identifier for the grouped apparition (see D15).
Basic Algorithm
The swiss cheese algorithm is conceptually simple, and consists of the following steps :-
Set
-
Compute
-
Using the comparison operators defined for apparitions in D16, renumber (sort) the apparitions
such that given any two distinct
where
, then the condition
holds if and only if
.
-
Now, generate groups as follows :
-
Find the apparition
with
such that
where
and
.
-
Find
and
, increment
and set
.
-
If
with
, continue at step 1. Otherwise, the algorithm has completed.
-
Find the apparition
In practice, the algorithm as described would be very hard to implement since the size of can be much greater than the physical memory available to a computer. It is therefore necessary to limit processing to small subsets of
. Because the computation of groups and apparition centroids involves finding apparitions in the immediate vicinity of positions on the sky, it is natural to split
into subsets corresponding to apparitions from distinct regions of the sky. A simple way of doing this is to partition the sky into a sequence of annuli (or bands) which each cover a range of declinations.
The Parallel Swiss Cheese Algorithm
Sky Subdivision: Dec Bands
Consider the set of subsets of











A particular band is fully processed when every group stemming from a centroid with declination falling into the range
has been generated (and those with centroids in
have been stored). For each apparition
belonging to such a group, all groups containing
must have been generated so that the group count for
is valid prior to storage.
Consider the group generated around the centroid
. The seed apparition for the group,
, is within distance
of
. Therefore, if every apparition in the declination range
has been removed from the set of seeds, then all groups with centroids inside
will have been generated.
Now consider an apparition . All groups containing
will have been generated when
and there exists no apparition
with
and
. Note that for such an apparition to exist,
.
Therefore, if with
in the declination range
, then
has been fully processed. Groups generated from apparitions in that declination range will include apparitions in the range
, and, in order to compute densities and centroids for them, the declination range
must be considered. For simplicity we consider only the maximum density radius,
, arriving at the conclusion that in order to fully process the band
, it is necessary to consider apparitions inside the declination range
. This condition, looser than strictly necessary, is not sufficient. To see why, reconsider the Basic Algorithm. It is clearly possible for an apparition
not in the declination range
to be denser than all apparitions in
. Furthermore, generating
could potentially remove apparitions inside the dec band from the set of seeds. Without considering
, it would therefore be impossible to finish processing
. In the general case,
must be considered in its entirety to fully process
.
If this situation is encountered, the parallel swiss cheese algorithm changes the optimal order (as defined by the Basic Algorithm) in which apparitions are considered for grouping.
Primary and Secondary Bands
- Primary Band
A primary band
is a band where
is even.
- Secondary Band
A secondary band
is a band where
is odd.
For band , the apparitions in the declination range
where
are considered (currently,
is set to
by default). If the condition
is enforced, then the regions on the sky being considered by any two non-adjacent bands are disjoint. In particular, no two primary or secondary bands will process regions that overlap. Therefore all primary bands can be processed independently and in parallel. Secondary bands can also be processed independently of eachother and in parallel, but do depend on the results of adjacent primary bands.
By forcing primary bands to save any information which effects adjacent secondary bands, it is possible to guarantee full processing of secondary bands with a significantly tighter declination range.
A primary band will store a list of groups generated around centroids with declination greater than or equal to
(the top group list for
), and a list of groups generated around centroids with declination less than
(the bottom group list for
). The groups in these lists will be stored according to the order in which they were generated.
Furthermore, group counts for apparitions in the declination range (the top apparition set for
) as well as group counts for apparitions in
(the bottom apparition set for
) will be stored.
Before creating any new groups, a secondary band will read the top group list and apparition set generated by
as well as the bottom group list and apparition set generated by
and generate the indicated groups in the given order. Essentially, the secondary band will blindly copy the work of its neighbours. This behaviour gives each primary band the latitude to deviate from the group generation order imposed by the Basic Algorithm.
What follows is a detailed description of the various steps of the swiss cheese algorithm.
Step 1: Apparition Retrieval
Consider the band to be processed,



Define the removal flag for apparition
as follows :
-
doesn't have to be removed from the set of seeds to finish processing
.
-
must be removed from the set of seeds to finish processing
.
Finally, define the total seed count to equal the sum of the removal flags of all retrieved apparitions.
Now for each retrieved apparition, set . Also set
.
Step 2: Density Pass
If


-
find
-
compute
from
-
compute
from
-
if
is in
, set
, otherwise set
.
-
if
is in
, set
and increment
.
Otherwise, for each apparition in
:
-
find
-
compute
from
-
compute
from
-
if
is in
, set
, otherwise set
.
-
if
is in
, set
and increment
.
Step 3: Stitching Pass (Secondary Bands Only)
This pass simply reads the top apparition set and group list generated by band

Step 4: Swiss Cheese Pass
Step 4.1
Examine the next as yet unconsidered apparition


If there is no such , then the pass has completed. At this point, if the pass did not generate any new groups and
, the algorithm has reached an impasse and continues with a Cheese Crumbling Pass.
If , then
has been completely processed and the algorithm continues with the Output Pass.
Otherwise, another swiss cheese pass is performed.
Step 4.2
At this point the region around as must be examined, keeping in mind that groups should be generated based on a sequence of apparitions ordered by decreasing density.
Now, a group must be generated from if there is no other apparition
with
(a seed) that is "denser" than
(
) which might generate a group containing
.
Any group containing must have a centroid within distance
of
, that is to say, must be generated from an apparition within distance
of
.
So, if with
and
, then continue at step 4.3.
Otherwise continue at step 4.1.
Step 4.3
Scan the set of apparitions











Step 4.4
If
-
if
has declination greater than or equal to
, append
to the top group list.
-
if
has declination less than
, append
to the bottom group list.
-
Increment the group count for every apparition in
Otherwise :
-
Increment the group count for every apparition in
that falls inside the declination range
.
Finally, if is inside
:
-
Set the group counter for
to
-
If requested, compute the group scan count by sorting the apparitions in
on scan key and then counting the number of unique scan key values
- Store a record of the generated group in memory
Step 5: Cheese Crumbling Pass (Primary Bands Only)
This pass is identical to the swiss cheese pass, except for the following: only apparitions



Step 6: Output Pass
If
- Save the top group list, bottom group list, top apparition set, and bottom apparition set to disk.
For each group stored by step 4.4 :
-
Scan the apparitions in
. If any of them has a group count greater than
, add
to
(marking the group confused). Also find the z-component
of the seed apparition's centroid (this is the apparition with counter equal to the group counter for
).
-
Pass
to a pluggable group parameter computation stage which computes group attributes (such as an average position, average magnitudes, etc...).
-
Lastly, if
, insert
and its associated apparitions and attributes into the various database output tables.
For each apparition in
, pass the list of groups containing
to a pluggable apparition parameter computation stage which computes attributes for
. Finally, insert
and its associated attributes into the grouped apparition table.
Generated on Thu Oct 21 13:19:39 2004 for WAX Version 2.1 by