The New Sudoku Players' Forum

by **dobrichev** » Wed Jul 07, 2010 5:59 pm

If we have a valid Sudoku solution grid and want to see ALL the puzzles of size up to the given limit, GridChecker is the tool we can use.
It is a successor of Checker, which seems to be frozen since November 2006.
This tool introduces several major enhancements. The initial tests indicate speed improvement in several times against the original Checker.
The source code in C++ along with a 32-bit Windows executable can be downloaded from here.

Here are some timings (grid, MCN, clues, seconds), Pentium D 2.8GHz, single thread.

Code: Select all: 594231786786945132123768954965173248378492561241856397432619875619587423857324619#20x17 12 17 229 123456789457189326689327154231645897745891632896732541318264975574918263962573418#Pt 15 20 518 21 9650 123456789456789123789123456231564897564897231897231564312645978645978312978312645#MC 12 20 9475 123456789456789123798231564234675918815943276967812435379164852582397641641528397#SF 9 17 52290 123456789457189263689237451275613948348975612961842537534798126712364895896521374#14x17 11 17 10392 123456789456789132789213645275941368638527914941638257394175826517862493862394571 8 17 313045

The last grid is included as one of the hardest possible grids - it has no U4, 3xU6, but 2 of them share a clue. No 17's there.

MD

Edit: Adding a post from the former Sudoku Programmers forum.

Thu Jul 08, 2010 4:36 pm dobrichev in Sudoku Programmers Forum wrote:How the GridChecker works

Step 1
Collect a list of Unavoidable Sets (UA).

It is possible to work with an external list (loaded from file), or with generated by the program list.

How the program generates UA sets.
1.1. Get UA sets containing up to 5 digits.
1.1.1. Create a puzzle with all occurences of digits {1,2,3,4,5} removed.
1.1.2. Solve the puzzle, storing all possible solutions.
The number of solutions is between 0.3M for more symmetric grids up to 8M for less symmetric.
Grids with more than 10M solutions may exist, but currently 10M is the hardcoded limit.
1.1.3. Compare each solution to the original grid, obtaining one non-minimal UA set per solution.
Actually only the removed digits are compared.
1.1.3.1. Compare the UA found to all previously found UA from the same puzzle.
1.1.3.1.1. If the new UA is a superset of an existing UA, ignore it and continue with next solution.
1.1.3.1.2. If the new UA is a subset of some of the existing ones, remove the respective supresets.
1.1.3.2. Add the new UA to the list of UA for this puzzle.
1.1.4. Add the local list of UA sets found from this puzzle to the global list of UA for the grid.
1.1.5. Repeat step 1.1.1 for all combinations of digits ({1,2,3,4,6}...{5,6,7,8,9}).
Now we have a list of all UA of size up to 11. (UA of size 12 can consist of 6 digits).

1.2. Get UA sets containing cells from up to 5 boxes.
1.2.1. Create a puzzle with all cells from boxes {1,2,3,4,5} removed.
1.2.2. Do the same as with digits removed in step 1.1.

In this way a list of 60K - 700K UA is formed. More symmetrical grids trend to less number of UA.
EDIT: From version 1.4 on the UA are generated by removing 4 (instead of 5) digits at a time, then several thousands times 54 random cells are emptied. This is faster and gives less amount of UA, but more UA of size 12 to 17.

Step 2
Prepare the grid for puzzle enumeration.

Since the enumeration is done by checking all the possible permutations of the given number of clues in the grid, we need to represent the grid in a form which allows as larger as possible areas to be skipped during the enumeration.

2.1. Obtain a list of all possible combinations of mutally disjoint UA sets having maximal number of sets. Inverting the property "disjoint" with "joint" it becomes the Maximal Cliques List, which term is used in the old Checker program and has gained popularity.

2.2. From the Maximal Cliques, choose one having minimal product of the sizes of UA sets.
Note: It is not proven whether this is the optimal choice.

2.3. Rate the UA within the chosen clique, and rate cells within each of the UA in the clique.
2.3.1. Rate UA by its size. Smaller UA have larger rate.
2.3.2. Rate UA having the same size by counting the number of mutally disjoined sets (DJ).
A list of level one DJ is created, which contain all UA having no cell in common to the examined UA.
Then each UA from level one list is compared for common cells to all others in the same list, resulting in level two DJ list.
Then each UA from level one is compared to level two list, resulting in level 3 list.
Level one list is compared to last level list until no DJ UA are found.
A list with the number of distinct DJ sets for the levels is formed and is stored in reverse order - the higher levels are at the top.
The UA with highest level non-empty list is rated first.
For UA with equal non-empty highest level, the one with most entries in the list (larger list size) is rated first.
For UA with still equal rating, the one with most entries in the next list is rated first, and so on.
For UA with still equal rating, the one with DJ sets from level one list, additionally rated by their size is chosen.

2.4. Rate the cells within each UA.
Create a list of UA sets disjoint to all higher rated UA within the clique and all the cells within the current UA except the current cell. Rate each UA within the list by its size (1/size**2), sum the ratings.

2.5. Rate the grid cells cells not included in the slected clique. Use same technique as for cells within UA.

2.6. Remap the grid.
2.6.1. Place the UA sets from the clique by their rating, the cells within UA and these outside the clique by their cell rating. Smaller UA are occuping the first cells, non-clique members occupy the last cells. Keep the mapping table, and create an inverse mapping table.
2.6.2. Remap all the UA sets in the same way.

2.7. Order the UA sets descending by their first cell (after remapping), then ascending by size.

2.8. Filter out largest UA. Choose the size treshold so that at most 1500 UA remain. Leave all the UA of size below the treshold. From the rest UA, erase the rightmost ones so that the final list contain up to 2000 UA sets.

2.9. Calculate tresholds for the number of clues so that at least N clues must be placed in the lower X cells in order at least one clue to hit an UA from the clique.

2.10. Prepare arrays of UA - one with their bitmap, and one with their lower position.

Step 3
Enumerate all puzzles of size N.

Assume a puzzle is a 81-bit binary number with 1 on clue positions and 0 elsewhere.
Assume the less significant bit is at right.
The goal is to check any permutation of N-clues puzzle for uniqueness of its solution.

The enumeration is done by initially placing all clues at right (minimal positions), then moving clues at left until all become in leftmost position.

Basic (naive) approach example for 5 givens:
Code: Select all
Possible positions for the leftmost clue .............................................................................0000 The rightmost 5 - 1 = 4 cells are reserved for the rest (less significant) clues. Initial position of the leftmost clue 00000000000000000000000000000000000000000000000000000000000000000000000000001xxxx Final position of the leftmost clue 1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx The positions marked with "x" are available for the rest of the clues. Start of enumeration 000000000000000000000000000000000000000000000000000000000000000000000000000011111 000000000000000000000000000000000000000000000000000000000000000000000000000101111 000000000000000000000000000000000000000000000000000000000000000000000000000110111 ... 111101000000000000000000000000000000000000000000000000000000000000000000000000000 111110000000000000000000000000000000000000000000000000000000000000000000000000000 End of enumeration

Since we know that in addition any permutation which leaves at least one unhit UA is invalid, we can skip some areas of the permutation space.

We can divide the process into steps. The implemented approach is:
3.1. Determine a valid position range for the first unplaced clue.
3.1.1. Determine maximal (leftmost) position.
3.1.1.1. Left treshold by previuos clue is the position next to the previous clue, 81 for the first clue.
3.1.1.2. Left treshold by clique is the position guaranted there are sufficient clues to hit the UA to the right.
The smaller of the tresholds is used to fix the upper limit of the range.
3.1.2. Determine minimal (rightmost) position.
3.1.2.1. Right treshold by next clues. There should be a room for the less significant clues.
3.1.2.2. Right treshold by first unhit UA set. Since UA are ordered by their less significant cell, there is no chance the first UA to be hit from the rest of the clues.
The largest of the tresholds is used to determine the lower limit of the range.
3.1.3. If the range is empty (i.e. lower limit exceeds upper limit), then backtrack.

Here is an example for determining the range for the third clue of total 5.
Code: Select all
UA sets in the chosen Maximal Clique 0000000000000000000000000000000000000000000000000000000000000000000< U6 ><U4><U4> Ordered list of the UA not hit 000001000100010001000001000001000000000000000000000000000000000000000000001000000 000100100010001000000000100010001000000000000000000000000000000000000000000100000 ... Positions of the first two clues 000000000000000000100000000100000000000000000000000000000000000000000000000000000 Steps in determining the valid range 000000000000000000100000000100000000000000000000000000000000000000000000000000000 #more significant clues 0000000000000000000000000000xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx #range 1 0000000000000000000000000000000000000000000000000000000000000000000< U6 ><U4><U4> #clique UA 0000000000000000000000000000000000000000000000000000000000000000000xxxxxxxxxxxxxx #range 2 000000000000000000000000000000000000000000000000000000000000000000000000000000011 #room for the rest 2 clues 0000000000000000000000000000000000000000000000000000000000000000000xxxxxxxxxxxx00 #range 3 000001000100010001000001000001000000000000000000000000000000000000000000001000000 #topmost UA 0000000000000000000000000000000000000000000000000000000000000000000xxxxxxxx000000 #range 4

3.2. For each position in the range:
3.2.1. Place the clue and remove the UA sets hit.
3.2.2. If all UA are hit, permute the rest clues (if any) in all possible ways and check the puzzles for uniqueness.
3.2.3. If it is last clue, backtrack.
3.2.4. Store the context for backtracking.
3.2.5. Call recursively for the first of the rest clues.

3.3. If it is the first clue, then done, else backtrack.

Some tricks

When determining the Maximal Clique, only UA of size up to 11 are processed. The rest cells in each Maximal Clique are checked for UA and if found, it is added to the UA and the process is repeated.

When determining the rating of UA in the clique, UA of size up to 8 are used. This simplification sometimes results in better (faster for enumeration) arrangement.

UA sets are represented as 128-bit bitmaps with 81 bits used. They are processed with SIMD instructions.

After the list of the unhit UA sets diminished to 768 sets, a swithing to bitmap mode is done. For each clue position a list of 6 128-bit values is marking the UA sets which are hit by this clue (81 hitting masks). Another 768-bit list is updated during the positioning of each clue (set mask). First zero bit in this list corresponds to the first unhit UA.

The solver knows the original grid, and when guessing, it does it in the "right" way, tracing extremely fast the path to the first solution, then finding the closer second solution if any.

MD

Last edited by dobrichev on Fri Jul 30, 2010 12:26 am; edited 1 time in total

by **dukuso** » Thu Jul 08, 2010 7:37 am

summary:

dukuso wrote:
so it backtracks through all n-subsets of the grid which are sudokus and contain a clue in each UA ?
the size of the maximum subset of mutually disjoint unavoidable sets,
is it useful, should it be determined

Yes, yes, and yes

by **dobrichev** » Thu Jul 08, 2010 6:00 pm

I opened a parallel topic in Sudoku Programmers Forum.

by **dobrichev** » Fri Jul 23, 2010 10:00 am

Hi,

The latest version of the tool can be downloaded from http://sites.google.com/site/dobrichev/gridchecker. Currently it is 1.3, with only Win32 executable available. The zip file for v.1.0 includes source code.

There is a thread in Programmers forum with more info on how the tool works.

GridChecker by design processes only one grid per run. The input file is expected to have only one line with 81-character grid content. If it somehow processes next line this is a bug and don't expect worth results.

ETTF is estimated time to finish, in hours. It trends to underestimate the time for the first few steps. After 10% done, the estimation is usually correct within +/- 30% limits.

The processing time varies from grid to grid, depending on the number and size of the disjoint UA sets.
For some grids it works more than 100 times faster than the old Checker, for other 5 times.

Processing starts with generation of some UA sets which takes between 3 and 15 minutes. The list with UA found is stored in a file. It is possible to use an external list of UA - just create a file x.unav.txt where x is the name of the grid (resp. x.txt is the single line file with the grid content). At least 2000 UA are used so shorter list will degrade the performance.

Later some messages are displayed - think of them that if something is moving then it is not dead.

During the actual enumeration the progress is displayed on every 1e8 iterations. First column is the actual progress and it will never go backward. The valid puzzels found are displayed, and are stored in the .puzzles.txt file.

Finally a summary is displayed.

MD

by **Red Ed** » Fri Jul 23, 2010 6:51 pm

dobrichev wrote:ETTF is estimated time to finish, in hours. It trends to underestimate the time for the first few steps. After 10% done, the estimation is usually correct within +/- 30% limits.

Is there anything to be gained by restructuring the search so that the positions three (say) steps down the search tree are visited in random order rather than natural search order? It might cut the variance of the ETTF.

I don't know how your search works, but here's what I mean in the case of a general search tree. Pick a depth at which there are several thousand positions. Put those positions into a list { P[1], ..., P[n] }. Let Ïƒ be a random permutation on {1,...,n}. Now search the subtrees P[Ïƒ(1)], P[Ïƒ(2)], ... in that order. The point is that doing the search this way might (only might) help to disperse any clumping of 'easy' and 'hard' positions in the original list; and the less 'clumpy' the search space, the tighter your ETTF will be.

by **dobrichev** » Sat Jul 24, 2010 7:12 am

Red Ed wrote:
dobrichev wrote:ETTF is estimated time to finish, in hours. It trends to underestimate the time for the first few steps. After 10% done, the estimation is usually correct within +/- 30% limits.
Is there anything to be gained by restructuring the search so that the positions three (say) steps down the search tree are visited in random order rather than natural search order? It might cut the variance of the ETTF.

I don't know how your search works, but here's what I mean in the case of a general search tree. Pick a depth at which there are several thousand positions. Put those positions into a list { P[1], ..., P[n] }. Let Ïƒ be a random permutation on {1,...,n}. Now search the subtrees P[Ïƒ(1)], P[Ïƒ(2)], ... in that order. The point is that doing the search this way might (only might) help to disperse any clumping of 'easy' and 'hard' positions in the original list; and the less 'clumpy' the search space, the tighter your ETTF will be.

Good point!

Something similar should be done not only for time/complexity estimation, but for dividing the task for distribution over several threads/machines.
As mentioned here, the search space is well ordered. It starts from binary number 0000...0111 and ends to 1110...0000, where the number of bits is 81, and number of ones is the number of clues. Since the number of ones is fixed, the binary number doesn't grow in arithmetrical progression, but it is easily correctable.

Here is a diagram for deviation of the progress for 3 grids.

: ettf.PNG (10.68 KiB) Viewed 3950 times

But yet the most fragile point is reordering. Minor changes in remapping logic lead to difference in execution time more than two times.

by **Red Ed** » Sat Jul 24, 2010 4:07 pm

dobrichev wrote:But yet the most fragile point is reordering. Minor changes in remapping logic lead to difference in execution time more than two times.

I don't think it's necessary to change the remapping logic. Explicitly (for some small R):

Run your algorithm until the first R clues are placed.
Save the state, i.e. the clue positions + sorted list of unavoidables + whatever other information you need, as P[1]
Instead of continuing to clue R+1 (as you do normally), backtrack to R-1 clues.

Add a clue, bringing you to the second position in which R clues are placed.
Save the state, i.e. the clue positions + sorted list of unavoidables + whatever other information you need, as P[2]
Instead of continuing to clue R+1 (as you do normally), backtrack to R-1 clues.

Add a clue, bringing you to the third position in which R clues are placed.
Save the state, i.e. the clue positions + sorted list of unavoidables + whatever other information you need, as P[3]
and so on

Now that you've got all the P[.] saved, run through them in random order.

by **ronk** » Sat Jul 24, 2010 9:01 pm

For 17 clues with v1.3, I get inconsistent completion times for this solution grid:

123456789456789132789231546261594378394178625875623491547862913638917254912345867

[edit: Hold on, I have this program on a new OS on a new computer. It may be the computer that's going to sleep.]

During this run, it seems to have taken a late morning nap at the 36% completion point:
35.3312% at 0.318h,Checked=1791,Found=0,ETTF=0.581h
35.4373% at 0.318h,Checked=1792,Found=0,ETTF=0.580h
35.5550% at 0.319h,Checked=1794,Found=0,ETTF=0.578h
35.7273% at 0.319h,Checked=1798,Found=0,ETTF=0.574h
35.8012% at 0.320h,Checked=1799,Found=0,ETTF=0.573h
35.9529% at 0.320h,Checked=1799,Found=0,ETTF=0.570h
36.1353% at 1.641h,Checked=1803,Found=0,ETTF=2.901h
36.2041% at 1.642h,Checked=1803,Found=0,ETTF=2.893h
36.4752% at 1.642h,Checked=1807,Found=0,ETTF=2.860h
36.6166% at 1.643h,Checked=1807,Found=0,ETTF=2.844h
36.6505% at 1.643h,Checked=1808,Found=0,ETTF=2.841h
36.7379% at 1.644h,Checked=1808,Found=0,ETTF=2.831h

Which looks normal during the next run:
35.3312% at 0.318h,Checked=1791,Found=0,ETTF=0.582h
35.4373% at 0.319h,Checked=1792,Found=0,ETTF=0.580h
35.5550% at 0.319h,Checked=1794,Found=0,ETTF=0.578h
35.7273% at 0.320h,Checked=1798,Found=0,ETTF=0.575h
35.8012% at 0.320h,Checked=1799,Found=0,ETTF=0.574h
35.9529% at 0.321h,Checked=1799,Found=0,ETTF=0.571h
36.1353% at 0.321h,Checked=1803,Found=0,ETTF=0.568h
36.2041% at 0.322h,Checked=1803,Found=0,ETTF=0.567h
36.4752% at 0.322h,Checked=1807,Found=0,ETTF=0.561h
36.6166% at 0.323h,Checked=1807,Found=0,ETTF=0.559h
36.6505% at 0.323h,Checked=1808,Found=0,ETTF=0.559h
36.7379% at 0.324h,Checked=1808,Found=0,ETTF=0.557h

But then another nap during the mid-afternoon, this time near 75%:
74.9362% at 0.600h,Checked=3000,Found=5,ETTF=0.201h
74.9735% at 0.601h,Checked=3000,Found=5,ETTF=0.201h
75.0105% at 0.601h,Checked=3001,Found=5,ETTF=0.200h
75.0397% at 0.602h,Checked=3001,Found=5,ETTF=0.200h
75.0764% at 0.602h,Checked=3001,Found=5,ETTF=0.200h
75.1081% at 0.603h,Checked=3002,Found=5,ETTF=0.200h
75.1656% at 1.392h,Checked=3004,Found=5,ETTF=0.460h
75.2053% at 1.392h,Checked=3004,Found=5,ETTF=0.459h
75.2789% at 1.393h,Checked=3004,Found=5,ETTF=0.457h
75.3256% at 1.393h,Checked=3015,Found=5,ETTF=0.456h
75.4042% at 1.394h,Checked=3020,Found=5,ETTF=0.455h
75.4682% at 1.394h,Checked=3020,Found=5,ETTF=0.453h

by **dobrichev** » Sun Jul 25, 2010 1:25 pm

Red Ed wrote:
dobrichev wrote:But yet the most fragile point is reordering. Minor changes in remapping logic lead to difference in execution time more than two times.

I don't think it's necessary to change the remapping logic. Explicitly (for some small R):
Run your algorithm until the first R clues are placed.
Save the state, i.e. the clue positions + sorted list of unavoidables + whatever other information you need, as P[1]
Instead of continuing to clue R+1 (as you do normally), backtrack to R-1 clues.
Add a clue, bringing you to the second position in which R clues are placed.
Save the state, i.e. the clue positions + sorted list of unavoidables + whatever other information you need, as P[2]
Instead of continuing to clue R+1 (as you do normally), backtrack to R-1 clues.
Add a clue, bringing you to the third position in which R clues are placed.
Save the state, i.e. the clue positions + sorted list of unavoidables + whatever other information you need, as P[3]
and so on
Now that you've got all the P[.] saved, run through them in random order.

Yes, it is realizable.
What I meant is that improving the remapping could produce significantly faster results, and therefore has higher priority to be done compared to more precise estimations and linearizing the progress.

by **dobrichev** » Sun Jul 25, 2010 1:42 pm

ronk wrote:For 17 clues with v1.3, I get inconsistent completion times for this solution grid
...
[edit: Hold on, I have this program on a new OS on a new computer. It may be the computer that's going to sleep.]

The program uses the calendar clock for calculations, not the actual CPU time. Sleep mode, pausing, overheating, low battery, executing concurrent tasks, etc. could cause such gaps.

by **daj95376** » Sun Jul 25, 2010 2:43 pm

dobrichev wrote:
ronk wrote:[edit: Hold on, I have this program on a new OS on a new computer. It may be the computer that's going to sleep.]

I have a laptop running Windows Vista that goes to "sleep" and stops running an active program when the screen saver times out. I haven't taken the time to run the problem down because I don't use that laptop much.

by **ronk** » Sun Jul 25, 2010 8:34 pm

::: off-topic :::

Sorry for the distraction. I should have realized much sooner that my computer was factory-configured to sleep after 20 minutes, presumably after the last keyboard or mouse usage. A Microsoft FAQ on the power-saving states for Windows 7 is here.

by **dobrichev** » Tue Jul 27, 2010 7:43 am

New version 1.4 with source code is published here.

by **dobrichev** » Sat Aug 07, 2010 11:15 am

Red Ed wrote:
dobrichev wrote:But yet the most fragile point is reordering. Minor changes in remapping logic lead to difference in execution time more than two times.

I don't think it's necessary to change the remapping logic. Explicitly (for some small R):
Run your algorithm until the first R clues are placed.
Save the state, i.e. the clue positions + sorted list of unavoidables + whatever other information you need, as P[1]
Instead of continuing to clue R+1 (as you do normally), backtrack to R-1 clues.
Add a clue, bringing you to the second position in which R clues are placed.
Save the state, i.e. the clue positions + sorted list of unavoidables + whatever other information you need, as P[2]
Instead of continuing to clue R+1 (as you do normally), backtrack to R-1 clues.
Add a clue, bringing you to the third position in which R clues are placed.
Save the state, i.e. the clue positions + sorted list of unavoidables + whatever other information you need, as P[3]
and so on
Now that you've got all the P[.] saved, run through them in random order.

Done in almost the same way, and included in version 1.6 available here.

by **ronk** » Sat Aug 07, 2010 4:38 pm

dobrichev, you might be interested in this solution grid. Hunting for a 17; at 2.4 hrs, ETTF=72+ hrs @ 2.8 GHz. Run aborted.

Code: Select all: 123456789457189632896327451234861975578934126619275348345612897781593264962748513

Is there an easy way to determine the version via the command line, like a 'GridChecker -V' maybe?

The New Sudoku Players' Forum

GridChecker, an exhaustive puzzle enumerator

GridChecker, an exhaustive puzzle enumerator

Re: GridChecker, an exhausive puzzle enumerator

Re: GridChecker, an exhaustive puzzle enumerator

The tool

Re: The tool

Re: The tool

Re: The tool

Re: GridChecker, an exhaustive puzzle enumerator

ETTF

Gaps in progress log

Re: Gaps in progress log

Re: Gaps in progress log

Next version

Re: The tool

Re: GridChecker, an exhaustive puzzle enumerator