## How to find the Optimal Solving Path

Advanced methods and approaches for solving Sudoku puzzles

### How to find the Optimal Solving Path

Hi,

I've implemented about a dozen different solving techniques in my Sudo Cue solver, but with the addition of tabling, I got the idea that my program does not always find the shortest way to solve a Sudoku.

The T&E-based techniques like tabling do a lot of searching and building implication chains and then make very little progress. A single elimination is made per check. I know that the Susser does 7 eliminations before handing control to lower level techniques, but that is just a simple limit.

What I would like to do is optimize the complete solving path, by selecting the most effective technique available at any time. In case there are more techniques possible at the same time, the program should "test" each of these and check which of those is the most effective. Selection should be made on effectiveness, and not on difficulty. A swordfish that can eliminate 7 candidates should be preferred to a naked pair that eliminates only 2 candidates, possibly destroying the swordfish along the way.

My suggestion of measuring effectiveness would be:

After applying a technique, count the number of candidates that can be eliminated using singles only, upto the point that you need another non-singles technique.

As a working example, this would be a nice test case:

Code: Select all
`. . .|. . .|. 4 .. . 2|. . 4|. . 1. 7 .|. 5 .|. 9 .-----+-----+-----. . 3|. . 7|. . .. 4 .|. 6 .|. . .6 . .|1 . .|8 . .-----+-----+-----. 2 .|. . .|1 . .8 5 .|9 . .|. 6 .. . .|. 8 .|. . 3`

My solver uses the following blend of techniques for this one:

22 Givens
39 Hidden singles (total)
20 Naked singles (total)

8 Line-Box interactions (locked candidates)
2 Naked pairs
1 Hidden pair
1 Hidden Triple
1 X-Wing
1 Coloring check
54 Tabling eliminations

This last figure (the 54 tabling eliminations) can be reduced, I think, erm... I know.

I would like to invite you guys to find & discuss the shortest path to solving this Sudoku, and then continue the discussion on methods to improve the Optimum Solving Path for any Sudoku.

Once this has been agreed upon, we can seriously open the hunt for the "toughest known" Sudoku. And I will be able to improve the Sudo Cue solver, of course.

I'm really looking forward to your reactions.

Ruud.

Edit: URL updated
Last edited by Ruud on Mon Jan 09, 2006 7:16 pm, edited 2 times in total.
Ruud

Posts: 664
Joined: 28 October 2005

My QBasic solver had to nest 5 deep in T&E resulting in guesses:
1 r9c6, 1 r9c6, 2 r9c6, 2 r9c6, 2 r9c7, 2 r9c8, 4 r9c4, 4 r9c7
5 r9c8, 6 r9c4, 6 r9c6, 7 r9c4, 7 r9c4, 7 r9c7, 7 r9c8, 9 r9c7

It ran in near-zero time and found 1 solution.

In all, it found (or re-found)
176 hidden singles
146 naked singles
1 hidden pair
16 locked candidates
3 naked pairs

Not sure what you are looking for and thus whether the above is great, good, bad or horrible.

Mac
QBasicMac

Posts: 441
Joined: 13 July 2005

Mac wrote:Not sure what you are looking for and thus whether the above is great, good, bad or horrible.

Good point, Mac.

So let us define how to represent the optimum solving path:

I had something like this in mind:

Initial # candidates = 238

The 9 in R6C2 and the 3 in R7C1 require singles only, so let us take those out of the equation. When these 2 are placed, the number of remaining candidates = 216. This list must ultimately be reduced to 81, so there are 135 candidates that need to be eliminated.

This is the candidates grid:

Code: Select all
`.---------------------.---------------------.---------------------.| 159    1368   15689 | 23678  12379  123689| 23567  4      25678 || 59     368    2     | 3678   379    4     | 3567   3578   1     || 14     7      1468  | 2368   5      12368 | 236    9      268   |:---------------------+---------------------+---------------------:| 125    18     3     | 2458   249    7     | 24569  125    24569 || 1257   4      1578  | 2358   6      23589 | 23579  12357  2579  || 6      9      57    | 1      234    235   | 8      2357   2457  |:---------------------+---------------------+---------------------:| 3      2      4679  | 4567   47     56    | 1      578    45789 || 8      5      147   | 9      12347  123   | 247    6      247   || 1479   16     14679 | 24567  8      1256  | 24579  257    3     |'---------------------'---------------------'---------------------'`

From this point on, we should do the following:

List the technique used, # candidates eliminated by the technique itself, # candidates eliminated by subsequent singles, remaining # candidates

The best solving path should use the fewest number of steps.

As for my solver, from the candidate list above, it requires 18 tabling steps before it is able to do any additional eliminations. In total it uses 68 steps. Definitely not a qualifier for the title.

I'm not sure how you define "re-found". A single found and placed can only produce a result once, or am I missing something here? As I suggested, maybe we should not bother about singles, just count the number of candidates eliminated and the number of non-singles steps required to achieve reduction to 81.

Ruud.
Ruud

Posts: 664
Joined: 28 October 2005

### Re: How to find the Optimal Solving Path

Ruud wrote:My suggestion of measuring effectiveness would be:

After applying a technique, count the number of candidates that can be eliminated using singles only, upto the point that you need another non-singles technique.

Are you suggesting that a solver that has a greater number of eliminations due to singles (both naked and hidden) necessarily has fewer eliminations due to other techniques? If so, that makes sense to me.
Ruud wrote:My solver uses the following blend of techniques for this one:

22 Givens
39 Hidden singles (total)
20 Naked singles (total)

8 Line-Box interactions (locked candidates)
2 Naked pairs
1 Hidden pair
1 Hidden Triple
1 X-Wing
1 Coloring check
54 Tabling eliminations

By way of comparison, my non-published solver uses the following blend of techniques for your example:
22 Givens
9 Hidden singles (total)
50 Naked singles (total)

7 Line-Box interactions (locked candidates) (9 eliminations)
2 Naked pairs (5 elims)
0 Hidden pair
0 Hidden Triple
0 X-Wing
0 Coloring check
38 forward implication chain eliminations

Hmmm! Relative to your results, I don't understand all the zeroes ... as hiddens, x-wing, and coloring all get a chance at an elimination before forward implication chains. The "priority" of techniques was ...

hidden_singles()
naked_pairs()
lockedcandidates()
naked_triples()
hidden_pairs()
x_wings()
swordfish()
hidden_triples()
xy_wings()
simple_coloring()
jellyfish()
xyz_wing()
implication()
ronk
2012 Supporter

Posts: 4764
Joined: 02 November 2005
Location: Southeastern USA

In his attempt to find the optimal solving path,
ronk wrote:Relative to your results, I don't understand all the zeroes ... as hiddens, x-wing, and coloring all get a chance at an elimination before forward implication chains.

I think I do understand, because you probably have a different order in which the forward implication chains are handled. At this moment, Sudo Cue is optimized to prioritize these chains on their shortest length. That obviously gives a different result than order by number or, as I would like to achieve, order by optimal solving path.

Since the hidden subsets, X-Wing and coloring steps are found between the forward implication chains, a different order results in a completely different solving path, not containing these steps.

For now, you're on top of the list, Ron, with 47 non-single steps. Could you tell us how you prioritize the forward implication chains?

Ruud.

Edit: URL updated
Last edited by Ruud on Mon Jan 09, 2006 7:15 pm, edited 1 time in total.
Ruud

Posts: 664
Joined: 28 October 2005

Ruud wrote:
ronk wrote:Relative to your results, I don't understand all the zeroes ... as hiddens, x-wing, and coloring all get a chance at an elimination before forward implication chains.

Since the hidden subsets, X-Wing and coloring steps are found between the forward implication chains, a different order results in a completely different solving path, not containing these steps.

That is certainly plausible, and I believe you are correct.

For now, you're on top of the list, Ron, with 47 non-single steps. Could you tell us how you prioritize the forward implication chains

With no smarts applied ...
Code: Select all
`for each row {    for each column {        for each candidate {            if posit that candidate true results in a contradiction {                eliminate the candidate                return TRUE            }        }    }}return FALSE`

The caller uses a TRUE return to loop back to logical techniques. Upon a FALSE return, the puzzle remains unsolved.
ronk
2012 Supporter

Posts: 4764
Joined: 02 November 2005
Location: Southeastern USA

ronk wrote:With no smarts applied

Not funny to see that a simple loop can find a shorter solving path than a mechanism that compares all the forced implication chains, and selects the shortest one from the lot.

I wonder whether there are others who are using logic that produces an even shorter path. 47 non-single steps is an improvement, but I doubt it will remain the shortest possible path.

Ruud.
Ruud

Posts: 664
Joined: 28 October 2005

Ruud,

I used to return the shortest chain but now I evaluate for each chain found, regardless of length, the following four figures:
- The number of candidate values remaining for the cell where the chain elimination occurred.
- The number of candidate positions remaining for the eliminated value in the row, column and box of the cell where the elimination occurred.

Any chain that leaves just a single candidate is returned immediately. Otherwise, the chain that leaves the smallest number of candidates is returned. Since there are likely to be many such chains, I return the chain for which the greatest number of eliminations has occurred (usually just one elimination occurs but some chains, e.g. X-Wing chains, eliminate more). Finally, when I still have many chains, I return the first such chain found, as it will be the shortest.

My solver runs the 'Many-Values Chains' routine twice - first without tables then with them, which means that sometimes a non-table chain is returned that doesn't leave a single candidate in a situation where a table chain would have left just a single candidate. This is because I don't really like table chains and I would rather see a slightly longer solution that avoids them.

With these rules in place (clearly they are in no sense optimal), the solver returns the following results:

Locked Sector Candidates 19 eliminated
Disjoint Subsets (pairs etc.) 11
Single-Valued Chains (fishy cycles) 6
Two-Sector Disjoint Subsets 2
Conditional Disjoint Subsets 2
Many-Valued Chains 28

Analysed in 23.8s

With guesses but no chains (to compare with QBasicMac):
Locked Sector Candidates 37
Disjoint Subsets 13
# Guesses 7 (Not the number eliminated, clearly)

Analysed in 0.016s.

I'm yet to implement one piece of table logic. Of course, I'd hope to see the results improve with it in place. Here's my log for the nightmare third move. I'd be interested to see how my chains compare with yours. Links such as r4c2+{?F1|1??}+r8c3 are Inferred links generated from tables. I will add extra debug later to identify the chains from which these links have been inferred.

Code: Select all
`3. Consider the chains r4c4-8-r4c2+{?F1|1??}+r8c3-4-r6c5+{?F1|7??}+r8c5 and r4c4-8-r4c2-8-r4c4+{?F4|7??}+r8c5~5~r7c4.Regardless of whether the cell r4c4 contains the value 8 or not, the cell r8c5 cannot contain the value 7.- The move r8c5:=7 has been eliminated.Consider the chains r8c5+{??4|1?F}+r8c6+{??5|1F?}+r5c1-4-r6c5 and r8c5-3-r8c6.Since it is certain that the cell r8c6 will not contain at least one of the values 1 and 3, the cell r8c5 cannot contain the value 4.- The move r8c5:=4 has been eliminated.Consider the chains r1c2+{??8|5?F}+r4c8 and r1c2+{??8|5?F}+r7c8.Since it is certain that Column 8 will not contain the value 5 in at least one of the cells r4c8 and r7c8, the cell r1c2 does not contain the value 8.- The move r1c2:=8 has been eliminated.Consider the chains r1c3+{??9|8?F}+r5c4 and r1c3+{??9|8?F}+r2c4~3~r2c8.Since it is certain that Column 4 will not contain the value 8 in at least one of the cells r5c4 and r2c4, the cell r1c3 does not contain the value 9.- The move r1c3:=9 has been eliminated.The value 9 in Box 7 must lie in Column 3.- The move r9c1:=9 has been eliminated.Consider the chains r5c6+{??3|2?F}+r9c6 and r5c6+{??3|2?F}+r9c8.Since it is certain that Row 9 will not contain the value 2 in at least one of the cells r9c6 and r9c8, the cell r5c6 does not contain the value 3.- The move r5c6:=3 has been eliminated.Consider the chains r7c3+{??7|5?F}+r7c9+{??1|5?F}+r9c4+{??9|9?F}+r1c1 and r7c3-9-r7c9.Since it is certain that the cell r7c9 will not contain at least one of the values 5 and 9, the cell r7c3 cannot contain the value 7.- The move r7c3:=7 has been eliminated.Consider the chains r1c7+{?F7|1??}+r8c3 and r2c8+{?F7|1??}+r8c3.Since it is certain that Box 3 will not contain the value 7 in at least one of the cells r1c7 and r2c8, the cell r8c3 does not contain the value 1.- The move r8c3:=1 has been eliminated.The value 1 in Box 8 must lie in Row 8.- The move r9c6:=1 has been eliminated.The values 1 and 3 occupy the cells r8c5 and r8c6 in some order.- The moves r8c5:=2 and r8c6:=2 have been eliminated.The value 2 in Box 9 must lie in Row 8.- The moves r9c7:=2 and r9c8:=2 have been eliminated.The value 2 in Box 6 must lie in Column 8.- The moves r4c7:=2, r4c9:=2, r5c7:=2, r5c9:=2 and r6c9:=2 have been eliminated.Consider the chain r6c8~5~r6c3~7~r5c1-7-r9c1~7~r9c8~5~r6c8.When the cell r6c8 contains the value 5, the chain is self-contradicting.Therefore, the cell r6c8 cannot contain the value 5.- The move r6c8:=5 has been eliminated.Consider the chains r6c6+{??5|4?F}+r9c4 and r6c6+{??5|4?F}+r9c7-2-r5c1-7-r9c1.Since it is certain that Row 9 will not contain the value 4 in at least one of the cells r9c4 and r9c7, the cell r6c6 does not contain the value 5.- The move r6c6:=5 has been eliminated.Consider the chain r6c8-<7|4>-r6c5~4~r7c5~7~r7c8.When the cell r7c8 contains the value 7, so does the cell r6c8 - a contradiction.Therefore, the cell r7c8 cannot contain the value 7.- The move r7c8:=7 has been eliminated.Consider the chain r1c5=1=r8c6-<4|7>-r7c5+{??1|2?T}+r1c5.The cell r1c5 must contain the value 1 if it doesn't contain the value 2.Therefore, these two values are the only candidates for the cell r1c5.- The moves r1c5:=3, r1c5:=7 and r1c5:=9 have been eliminated.Consider the chains r7c9~4~r7c5+{??3|2?F}+r1c5-1-r8c5 and r3c6~3~r8c6-1-r8c5.Regardless of whether the cell r8c5 contains the value 1 or not, the cell r3c6 cannot contain the value 3.- The move r3c6:=3 has been eliminated.Consider the chains r9c4~7~r7c5+{??3|2?F}+r1c5-1-r8c5 and r1c6~3~r8c6-1-r8c5.Regardless of whether the cell r8c5 contains the value 1 or not, the cell r1c6 cannot contain the value 3.- The move r1c6:=3 has been eliminated.Consider the chains r4c5+{??2|5?F}+r9c8~7~r6c3~7~r2c4 and r4c5+{??2|5?F}+r4c8.Since it is certain that Column 8 will not contain the value 5 in at least one of the cells r9c8 and r4c8, the cell r4c5 does not contain the value 2.- The move r4c5:=2 has been eliminated.Consider the chain r6c5-2-r1c5=1=r8c6-<1|2>-r6c6.The cell r6c6 must contain the value 2 if the cell r6c5 doesn't.Therefore, these two cells are the only candidates for the value 2 in Row 6.- The move r6c8:=2 has been eliminated.The value 2 in Box 5 must lie in Row 6.- The moves r4c4:=2, r5c4:=2 and r5c6:=2 have been eliminated.The values 1 and 2 occupy the cells r4c8 and r5c8 in some order.- The moves r4c8:=5, r5c8:=3, r5c8:=5 and r5c8:=7 have been eliminated.Consider the chain r3c7-3-r3c4~3~r5c4-3-r5c7.The cell r5c7 must contain the value 3 if the cell r3c7 doesn't.Therefore, these two cells are the only candidates for the value 3 in Column 7.- The moves r1c7:=3 and r2c7:=3 have been eliminated.Consider the chain r3c4-3-r3c7-3-r5c7-3-r5c4.The cell r5c4 must contain the value 3 if the cell r3c4 doesn't.Therefore, these two cells are the only candidates for the value 3 in Column 4.- The moves r1c4:=3 and r2c4:=3 have been eliminated.The cell r1c2 is the only candidate for the value 3 in Row 1.`
rubylips

Posts: 149
Joined: 01 November 2005

rubylips wrote:I used to return the shortest chain but now I evaluate for each chain found, regardless of length, the following four figures:
- The number of candidate values remaining for the cell where the chain elimination occurred.
- The number of candidate positions remaining for the eliminated value in the row, column and box of the cell where the elimination occurred.

OK, that makes sense, using the size of 4 constraints that the candidate belongs to.

Any chain that leaves just a single candidate is returned immediately.

For normal solving, I agree this is the fastest. For finding the optimal solving path, there may be reasons to evaluate multiple candidates that will leave a single. One single could be more effective than the other.

With these rules in place (clearly they are in no sense optimal), the solver returns the following results:

Locked Sector Candidates 19 eliminated
Disjoint Subsets (pairs etc.) 11
Single-Valued Chains (fishy cycles) 6
Two-Sector Disjoint Subsets 2
Conditional Disjoint Subsets 2
Many-Valued Chains 28

I'm not sure how to count the number of non-singles steps from this list. Looks like the table contains both elimination counts and step counts.

Analysed in 23.8s

Solving time is not really relevant here. If it takes someone a week to find the shortest path, so be it.

Thanks for you input, rubylips. I really like the concept of priority by constraint size. I'll put that in Sudo Cue too, so we can compare the multi-value-chain steps. I don't think I wil implement the 2-phase chain checking in this stage. If a more efficient chain exists, I want to use it, even if it looks ugly.

Ruud.

Edit: URL updated
Last edited by Ruud on Mon Jan 09, 2006 7:17 pm, edited 1 time in total.
Ruud

Posts: 664
Joined: 28 October 2005

An update.

I have changed to code to assign priorities to tabling eliminations, by constraint size.

Here are the updated results:

5 Line-Box interactions (locked sector candidates)
1 Naked pair
2 Naked Triples
1 Hidden pair
1 Hidden Triple
1 Coloring check
1 Template elimination
41 Tabling eliminations

Total non-single steps: 53 (was 68)

This is definitely a significant step forward. Thanks rubylips!

I've been contemplating another way to optimize the solving path:

In my solver, I can detect magic cells. In fact, they should be named "magic candidates". I'm not using them to proceed yet, because they behave too much like "lucky guesses", and lack undenyable proof.

However, once the solver has identified these "magic candidates", it could assign a higher priority to candidate eliminations that speed up their exposure.

Ruud.
Ruud

Posts: 664
Joined: 28 October 2005

Ruud wrote:In my solver, I can detect magic cells. In fact, they should be named "magic candidates". I'm not using them to proceed yet, because they behave too much like "lucky guesses", and lack undenyable proof.

Can you recommend a good thread on "magic cells (candidates)" or, if not a thread, a definition?

TIA, Ron
ronk
2012 Supporter

Posts: 4764
Joined: 02 November 2005
Location: Southeastern USA

Ron wrote:Can you recommend a good thread on "magic cells (candidates)" or, if not a thread, a definition?

As far as I know, a Magic Candidate is a candidate for which the forward implication tree is able to bind each remaining unbound cell to a single digit. In other words, the assertion of such a candidate completely solves the sudoku through its implication tree. Magic cells were introduced when 1-Constrained Sudokus were defined. A 1-Constrained sudoku could be solved with a single magic cell. When you use the Search function of this forum, you may be able to find a more formal definition.

The disadvantage of a magic candidate is that it does not prove that it has found the only solution.

Because implication trees are built with the most basic of methods (that may change), a magic candidate, when placed, leaves a sudoku with singles only. In the Optimal Solving Path, that would be a nice shortcut.

Ruud.
Ruud

Posts: 664
Joined: 28 October 2005

Ruud wrote:Because implication trees are built with the most basic of methods (that may change), a magic candidate, when placed, leaves a sudoku with singles only.

Unfortunately, I'm not familiar with implication trees either. Can I interpret that to mean ... if the elimination of a candidate creates a naked single, and if the implication of only naked singles is applied recursively, eventually leading to solution of the puzzle, then that candidate is a "magic candidate"?
ronk
2012 Supporter

Posts: 4764
Joined: 02 November 2005
Location: Southeastern USA

Ron wrote:if the elimination of a candidate creates a naked single, and if the implication of only naked singles is applied recursively, eventually leading to solution of the puzzle, then that candidate is a "magic candidate"?

The description is correct, but both naked and hidden singles can be used to build the tree.

If you want to learn more about implication trees, check this topic on the Sudoku Programmers forum.

Ruud.
Ruud

Posts: 664
Joined: 28 October 2005

Selection should be made on effectiveness, and not on difficulty. A swordfish that can eliminate 7 candidates should be preferred to a naked pair that eliminates only 2 candidates, possibly destroying the swordfish along the way.

I don't fully agree.
Each test you choose involves a cost (computer time).

If the cost to eliminate 7 candidates is big and the relative cost of eliminating 2 candidates is very small, then the 2-candidate test should be preferred.

Also, cost varies over time. For example, who would do anything else except perfrom eliminations based on singletons when faced with a new puzzle.

Another thing, you don't actually know if a test is going to give positive results untill you try it. This is why complicated tests are normally tried only when simple tests (the least costly) fail.

Another problem with testing: how do you prevent yourself from repeating a test you have already completed (and is doomed to fail if tried again)? Keeping track of tests completed tests also involves a cost. (Hash tables may be effective.)

Yes, its complicated.
holdout

Posts: 35
Joined: 30 August 2005
Location: Bowie, Maryland USA

Next