How the GridChecker works
Step 1
Collect a list of Unavoidable Sets (UA).It is possible to work with an external list (loaded from file), or with generated by the program list.
How the program generates UA sets.
1.1. Get UA sets containing up to 5 digits.
1.1.1. Create a puzzle with all occurences of digits {1,2,3,4,5} removed.
1.1.2. Solve the puzzle, storing all possible solutions.
The number of solutions is between 0.3M for more symmetric grids up to 8M for less symmetric.
Grids with more than 10M solutions may exist, but currently 10M is the hardcoded limit.
1.1.3. Compare each solution to the original grid, obtaining one non-minimal UA set per solution.
Actually only the removed digits are compared.
1.1.3.1. Compare the UA found to all previously found UA from the same puzzle.
1.1.3.1.1. If the new UA is a superset of an existing UA, ignore it and continue with next solution.
1.1.3.1.2. If the new UA is a subset of some of the existing ones, remove the respective supresets.
1.1.3.2. Add the new UA to the list of UA for this puzzle.
1.1.4. Add the local list of UA sets found from this puzzle to the global list of UA for the grid.
1.1.5. Repeat step 1.1.1 for all combinations of digits ({1,2,3,4,6}...{5,6,7,8,9}).
Now we have a list of all UA of size up to 11. (UA of size 12 can consist of 6 digits).
1.2. Get UA sets containing cells from up to 5 boxes.
1.2.1. Create a puzzle with all cells from boxes {1,2,3,4,5} removed.
1.2.2. Do the same as with digits removed in step 1.1.
In this way a list of 60K - 700K UA is formed. More symmetrical grids trend to less number of UA.
EDIT: From version 1.4 on the UA are generated by removing 4 (instead of 5) digits at a time, then several thousands times 54 random cells are emptied. This is faster and gives less amount of UA, but more UA of size 12 to 17.Step 2
Prepare the grid for puzzle enumeration.Since the enumeration is done by checking all the possible permutations of the given number of clues in the grid, we need to represent the grid in a form which allows as larger as possible areas to be skipped during the enumeration.
2.1. Obtain a list of all possible combinations of mutally disjoint UA sets having maximal number of sets. Inverting the property "disjoint" with "joint" it becomes the Maximal Cliques List, which term is used in the old Checker program and has gained popularity.
2.2. From the Maximal Cliques, choose one having minimal product of the sizes of UA sets.
Note: It is not proven whether this is the optimal choice.
2.3. Rate the UA within the chosen clique, and rate cells within each of the UA in the clique.
2.3.1. Rate UA by its size. Smaller UA have larger rate.
2.3.2. Rate UA having the same size by counting the number of mutally disjoined sets (DJ).
A list of level one DJ is created, which contain all UA having no cell in common to the examined UA.
Then each UA from level one list is compared for common cells to all others in the same list, resulting in level two DJ list.
Then each UA from level one is compared to level two list, resulting in level 3 list.
Level one list is compared to last level list until no DJ UA are found.
A list with the number of distinct DJ sets for the levels is formed and is stored in reverse order - the higher levels are at the top.
The UA with highest level non-empty list is rated first.
For UA with equal non-empty highest level, the one with most entries in the list (larger list size) is rated first.
For UA with still equal rating, the one with most entries in the next list is rated first, and so on.
For UA with still equal rating, the one with DJ sets from level one list, additionally rated by their size is chosen.
2.4. Rate the cells within each UA.
Create a list of UA sets disjoint to all higher rated UA within the clique and all the cells within the current UA except the current cell. Rate each UA within the list by its size (1/size**2), sum the ratings.
2.5. Rate the grid cells cells not included in the slected clique. Use same technique as for cells within UA.
2.6. Remap the grid.
2.6.1. Place the UA sets from the clique by their rating, the cells within UA and these outside the clique by their cell rating. Smaller UA are occuping the first cells, non-clique members occupy the last cells. Keep the mapping table, and create an inverse mapping table.
2.6.2. Remap all the UA sets in the same way.
2.7. Order the UA sets descending by their first cell (after remapping), then ascending by size.
2.8. Filter out largest UA. Choose the size treshold so that at most 1500 UA remain. Leave all the UA of size below the treshold. From the rest UA, erase the rightmost ones so that the final list contain up to 2000 UA sets.
2.9. Calculate tresholds for the number of clues so that at least N clues must be placed in the lower X cells in order at least one clue to hit an UA from the clique.
2.10. Prepare arrays of UA - one with their bitmap, and one with their lower position.
Step 3
Enumerate all puzzles of size N.Assume a puzzle is a 81-bit binary number with 1 on clue positions and 0 elsewhere.
Assume the less significant bit is at right.
The goal is to check any permutation of N-clues puzzle for uniqueness of its solution.
The enumeration is done by initially placing all clues at right (minimal positions), then moving clues at left until all become in leftmost position.
Basic (naive) approach example for 5 givens:
- Code: Select all
Possible positions for the leftmost clue
.............................................................................0000
The rightmost 5 - 1 = 4 cells are reserved for the rest (less significant) clues.
Initial position of the leftmost clue
00000000000000000000000000000000000000000000000000000000000000000000000000001xxxx
Final position of the leftmost clue
1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
The positions marked with "x" are available for the rest of the clues.
Start of enumeration
000000000000000000000000000000000000000000000000000000000000000000000000000011111
000000000000000000000000000000000000000000000000000000000000000000000000000101111
000000000000000000000000000000000000000000000000000000000000000000000000000110111
...
111101000000000000000000000000000000000000000000000000000000000000000000000000000
111110000000000000000000000000000000000000000000000000000000000000000000000000000
End of enumeration
Since we know that in addition any permutation which leaves at least one unhit UA is invalid, we can skip some areas of the permutation space.
We can divide the process into steps. The implemented approach is:
3.1. Determine a valid position range for the first unplaced clue.
3.1.1. Determine maximal (leftmost) position.
3.1.1.1. Left treshold by previuos clue is the position next to the previous clue, 81 for the first clue.
3.1.1.2. Left treshold by clique is the position guaranted there are sufficient clues to hit the UA to the right.
The smaller of the tresholds is used to fix the upper limit of the range.
3.1.2. Determine minimal (rightmost) position.
3.1.2.1. Right treshold by next clues. There should be a room for the less significant clues.
3.1.2.2. Right treshold by first unhit UA set. Since UA are ordered by their less significant cell, there is no chance the first UA to be hit from the rest of the clues.
The largest of the tresholds is used to determine the lower limit of the range.
3.1.3. If the range is empty (i.e. lower limit exceeds upper limit), then backtrack.
Here is an example for determining the range for the third clue of total 5.
- Code: Select all
UA sets in the chosen Maximal Clique
0000000000000000000000000000000000000000000000000000000000000000000< U6 ><U4><U4>
Ordered list of the UA not hit
000001000100010001000001000001000000000000000000000000000000000000000000001000000
000100100010001000000000100010001000000000000000000000000000000000000000000100000
...
Positions of the first two clues
000000000000000000100000000100000000000000000000000000000000000000000000000000000
Steps in determining the valid range
000000000000000000100000000100000000000000000000000000000000000000000000000000000 #more significant clues
0000000000000000000000000000xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx #range 1
0000000000000000000000000000000000000000000000000000000000000000000< U6 ><U4><U4> #clique UA
0000000000000000000000000000000000000000000000000000000000000000000xxxxxxxxxxxxxx #range 2
000000000000000000000000000000000000000000000000000000000000000000000000000000011 #room for the rest 2 clues
0000000000000000000000000000000000000000000000000000000000000000000xxxxxxxxxxxx00 #range 3
000001000100010001000001000001000000000000000000000000000000000000000000001000000 #topmost UA
0000000000000000000000000000000000000000000000000000000000000000000xxxxxxxx000000 #range 4
3.2. For each position in the range:
3.2.1. Place the clue and remove the UA sets hit.
3.2.2. If all UA are hit, permute the rest clues (if any) in all possible ways and check the puzzles for uniqueness.
3.2.3. If it is last clue, backtrack.
3.2.4. Store the context for backtracking.
3.2.5. Call recursively for the first of the rest clues.
3.3. If it is the first clue, then done, else backtrack.
Some tricksWhen determining the Maximal Clique, only UA of size up to 11 are processed. The rest cells in each Maximal Clique are checked for UA and if found, it is added to the UA and the process is repeated.
When determining the rating of UA in the clique, UA of size up to 8 are used. This simplification sometimes results in better (faster for enumeration) arrangement.
UA sets are represented as 128-bit bitmaps with 81 bits used. They are processed with SIMD instructions.
After the list of the unhit UA sets diminished to 768 sets, a swithing to bitmap mode is done. For each clue position a list of 6 128-bit values is marking the UA sets which are hit by this clue (81 hitting masks). Another 768-bit list is updated during the positioning of each clue (set mask). First zero bit in this list corresponds to the first unhit UA.
The solver knows the original grid, and when guessing, it does it in the "right" way, tracing extremely fast the path to the first solution, then finding the closer second solution if any.
MD