The New Sudoku Players' Forum

by **rubylips** » Tue Dec 06, 2005 1:29 pm

Jeff wrote:In terms of manual solving, which is what I am interested in.

I share this interest. Although I nearly always present solver analysis in my posts, my aim is to produce a solver that would pass a Turing Test, i.e. to produce a solver that would 'reason' in a manner indistinguishable from a human.

Jeff wrote:Subset analyses such as the xz rule, xyz-wings, empty cell contradiction and implied locked set should be executed before 3D medusa or bilocation/bivalue plot, simply because they are much much more user friendly.

Essentially, I agree. My solver implements Conditional Disjoint Subsets - my name for what bennys calls Almost Locked Sets - before Many-Valued Chains. However, there's one complication. I've implemented the techniques discussed in this and other related threads in two distinct places - within the Conditional Disjoint Subsets strategy, where a static analysis independent of chains is performed, and within the Many-Valued Chains strategy, where the Disjoint Subset link has been introduced (e.g. see my first post on this thread). I think that it's impossible to separate chains and Locked Set analysis completely if the full power of the two techniques is to be realized.

by **Bob Hanson** » Tue Dec 06, 2005 2:18 pm

Jeff wrote:In terms of manual solving, which is what I am interested in, subset analyses such as the xz rule, xyz-wings, empty cell contradiction and implied locked set should be executed before 3D medusa or bilocation/bivalue plot, simply because they are much much more user friendly. To this effect, the statement could well be "In general, 3D medusa and bilocation/bivalue plot aren't necessary if subset analyses are to be conducted on every grid in the first place".

Oh, I agree 100%. 3D Medusa is a last resort for my hand solving, for sure. That's why bennys's idea is such a great one.

by **rubylips** » Tue Dec 06, 2005 2:33 pm

Here's the definition of my Conditional Disjoint Subsets strategy, which I believe to be very similar, if not the same, as Almost Locked Subsets.

Consider the intersection between a Line (i.e. a Row or Column) and a Box. Suppose that the Line contains an Almost Disjoint Subset L, i.e. a subset of nL cells that between them contain nL+1 values (call them VL) and that the Box contains a subset B of nB cells (call them VB) that between them contain nB+1 values. Furthermore, suppose that the subsets B and L have some non-trivial intersection I, with candidates VI.

Consider each cell in I in turn. Suppose that such a cell contains at least one value that appears in VL\VB and one that appears in VB\VL. Then it's possible to remove the candidates in (VB x VL)\VI from the cells (Box x Line)\I, where 'x' means intersection.

Proof:

Call the cell in I CI. The candidates for CI divide into three categories:
i. Elements of VL\VB.
When such an element is placed in CI, the subset L 'loses' two values - the placed element of L\B and the candidate for B\L - yet it has 'lost' only one cell, CI. Therefore, the remaining nL-1 cells must contain the remaining nL-1 values, which include the values (VB x VL)\VI. It follows that these values cannot occur in the cells (Box x Line)\I.
ii. Elements of VB\VL.
A similar argument shows that, again, the values (VB x VI)\VI are excluded from (Box x Line)\I.
iii. Elements of VBxVL
The placed element is trivially excluded from (Box x Line)\I. The argument used above shows that any other element of VB x VL must appear in both B\L and L\B and is therefore excluded from (Box x Line)\I.

by **rubylips** » Tue Dec 06, 2005 3:25 pm

I'll run through the three examples from bennys in order to illustrate the definition I've just posted.

I. This puzzle falls to a fishy cycle that most people would, I presume, look for prior to a subset analysis:

Code: Select all: Consider the chain r5c5-8-r8c5-8-r8c8-8-r6c8-8-r6c6. When the cell r5c5 contains the value 8, so does the cell r6c6 - a contradiction. Therefore, the cell r5c5 cannot contain the value 8. When the cell r6c6 contains the value 8, so does the cell r5c5 - a contradiction. Therefore, the cell r6c6 cannot contain the value 8. - The moves r5c5:=8 and r6c6:=8 have been eliminated. The value 4 is the only candidate for the cell r6c6.

Alternatively,

Code: Select all: Consider the cell r5c4. When it contains the value 7, the value 5 in Column 4 must occupy the cell r3c4. When it contains the value 4, the values 5 and 8 in Box 5 must occupy the cells r5c5 and r6c6 in some order. Whichever value it contains, the cells r4c4 and r6c4 cannot contain the value 5. - The move r4c4:=5 has been eliminated. The value 6 is the only candidate for the cell r4c4.

II. This puzzle is trickier. Subset analysis doesn't lead to a critical elimination yet nor does any single forcing chain I've found. After:

Code: Select all: 1. Consider the cell r9c7. When it contains the value 9, the values 3, 4 and 7 in Row 9 must occupy the cells r9c2, r9c3 and r9c9 in some order. When it contains the value 2, the value 4 in Box 9 must occupy the cell r8c7. Whichever value it contains, the cell r9c8 cannot contain the value 4. - The move r9c8:=4 has been eliminated.

some straightforward analysis:

Code: Select all: Consider the chain r2c3~4~r2c8-4-r8c8~4~r8c1-4-r9c3. When the cell r2c3 contains the value 4, so does the cell r9c3 - a contradiction. Therefore, the cell r2c3 cannot contain the value 4. - The move r2c3:=4 has been eliminated. The values 3, 4 and 5 occupy the cells r2c1, r2c8 and r2c9 in some order. - The moves r2c1:=9, r2c8:=9 and r2c9:=9 have been eliminated. Consider the chain r2c6-9-r2c3~9~r6c3-9-r4c1-9-r4c6. When the cell r4c6 contains the value 9, so does the cell r2c6 - a contradiction. Therefore, the cell r4c6 cannot contain the value 9. - The move r4c6:=9 has been eliminated. The value 7 is the only candidate for the cell r4c6. 2. The value 9 is the only candidate for the cell r4c1. 3. The value 7 is the only candidate for the cell r6c3.

leads to the following candidate grid:

Code: Select all: 5|7 5|7|8|9 1|2 | 3 4 1|2 | 5|8|9 5|9 6 4|5 6 2|9 | 7 8 2|9 | 1 3|4|5 3|4|5 3|4 8|9 1|3|4|9 | 1|9 6 5 | 4|8|9 7 2 ----------------------------+------------------+----------------------- 9 4 8 | 5 2 7 | 3 6 1 6 1 5 | 8|9 3 4|8|9 | 7 2 4|9 2 3 7 | 6 1 4|9 | 4|5|9 8 4|5|9 ----------------------------+------------------+----------------------- 1 2 3|9 | 4 7 3|8 | 6 3|5|9 3|5|8|9 3|4|5|7 5|7 6 | 1|2|8 9 1|3|8 | 2|4 1|3|4 3|4|7|8 8 7|9 3|4|9 | 1|2 5 6 | 2|4|9 1|3|9 3|4|7|9

The critical chain here is:

Code: Select all: Consider the chain r3c7+<4|1>+r3c4~1~r9c4-2-r8c4-2-r8c7. When the cell r8c7 contains the value 4, some other value must occupy the cell r3c7, which means that the value 2 must occupy the cell r8c7 - a contradiction. Therefore, the cell r8c7 cannot contain the value 4. - The move r8c7:=4 has been eliminated. The value 2 is the only candidate for the cell r8c7.

The link r3c7+<4|1>+r3c4 relies on the Almost Locked Set in r3c2, r3c4 and r3c6. It's not all plain sailing from here, because after:

Code: Select all: 1. The cell r9c4 is the only candidate for the value 2 in Row 9. 2. The cell r9c8 is the only candidate for the value 1 in Row 9.

we're left with the candidate grid:

Code: Select all: 5|7 5|7|8|9 1|2 | 3 4 1|2 | 5|8|9 5|9 6 4|5 6 2|9 | 7 8 2|9 | 1 3|4|5 3|4|5 3|4 8|9 1|3|4|9 | 1|9 6 5 | 4|8|9 7 2 ----------------------------+----------------+----------------------- 9 4 8 | 5 2 7 | 3 6 1 6 1 5 | 8|9 3 4|8|9 | 7 2 4|9 2 3 7 | 6 1 4|9 | 4|5|9 8 4|5|9 ----------------------------+----------------+----------------------- 1 2 3|9 | 4 7 3|8 | 6 3|5|9 3|5|8|9 3|4|5|7 5|7 6 | 1|8 9 1|3|8 | 2 3|4 3|4|7|8 8 7|9 3|4|9 | 2 5 6 | 4|9 1 3|4|7|9

which requires the observation:

Code: Select all: Consider the cell r9c3. When it contains the value 4, the value 9 in Row 9 must occupy the cell r9c7. When it contains the value 3, the value 9 in Box 7 must occupy the cell r7c3. Whichever value it contains, the cells r9c1 and r9c2 cannot contain the value 9. - The move r9c2:=9 has been eliminated. The value 7 is the only candidate for the cell r9c2.

in order to solve the puzzle.

III. Straightforward

Code: Select all: 1. Consider the cell r6c1. When it contains the value 8, the values 1 and 5 in Row 6 must occupy the cells r6c4 and r6c6 in some order. When it contains the value 7, the value 1 in Box 4 must occupy the cell r5c1. Whichever value it contains, the cells r6c2 and r6c3 cannot contain the value 1. - The move r6c3:=1 has been eliminated. The value 3 is the only candidate for the cell r6c3.

by **Bob Hanson** » Tue Dec 06, 2005 3:53 pm

Couple of ideas:

1) 2-valued cells are the simplest case of an almost complete set. So the fastest scanning would be for two 2-valued cells.

{12} {21}

that, of course, is a naked pair. The exclusion principle holds for either 1 or 2.

2) Since X-cycles (X-wings and such) are the 3D-rotation of subsets, everything said here may have a complement in the plane of the board.

3) The connection is broader than described. The term we need is "weak link":

Code: Select all: Set A is said to be "weakly linked" to set B if: a) any one member candidate k of set A is weakly linked (a "buddy" -- that is, in the same house) as every candidate k of set B (individually, not necessarily the same house for each), or b) any one member candidate k of set A is weakly linked to some other candidate that itself is weakly linked as described above to B or is a member of a weakly linked chain of candidates that ultimately is weakly linked to set B as described above.

The exclusion principle, I think, would state something like this:

Code: Select all: If there exists a set A of n_A cells that together have n_A+1 possible candidates, and there is a set B consisting of n_B cells that together have n_B+1 possible candidates, and set A has a weak link to set B, and set B has a weak link to set A, then any candidate z can be eliminated if that candidate: (a) is not one of those constituting the weak links, and (b) is in a cell that is not a member of A or B, and (c) is weakly linked to both A and B.

Points:

1) there doesn't have to be a direct connection between the sets. There can be other structures (conjugate pairs?, chains?, sets? almost-linked sets?) that can mediate the weak connections.

xx) bennys has suggests m=1, but I suggest it is more general than that. [edit: I take that back. m needs to be 1.]

2) there can be two different candidates involved in the weak links, one going one way, linking A to B, and one going the other way, linking B to A. So, for example, say A is linked to B by 6 and B is linked to A by 2. Then z can't be either 2 or 6. (Because setting it TRUE would simply break the weak link, not add a new one.)

3) with this wider definition, I believe this theory of almost linked sets subsumes ALL 3D Medusa or other forced-chain logic methods, especially if it recognized that conjugate pairs are simply almost linked sets as seen from a different dimension.

[edited to read "+1" not "+m"]

by **ronk** » Tue Dec 06, 2005 5:38 pm

rubylips wrote:Consider each cell in I in turn. Suppose that such a cell contains at least one value that appears in VL/VB and one that appears in VB/VL. Then it's possible to remove the candidates in (VB x VL)/VI from the cells (Box x Line)/I, where 'x' means intersection.

And what does the '/' mean?

by **rubylips** » Tue Dec 06, 2005 5:44 pm

ronk wrote:And what does the '/' mean?

Sorry, that should be '\', the set difference. I blame it on overexposure to HTML.

by **ronk** » Tue Dec 06, 2005 5:49 pm

rubylips wrote:iii. Elements of VBxVL
The placed element is trivially excluded from (Box x Line)/I. The argument used above shows that any other element of VB x VL must appear in both B/L[b] and [b]L/B and is therefore excluded from (Box x Line)/I.

And what does the "B/L[b] and [b]L/B" mean?

by **rubylips** » Tue Dec 06, 2005 6:00 pm

ronk wrote:And what does the "B/L[b] and [b]L/B" mean?

More problems with '/' and '\'. I've edited the original post.

by **ronk** » Wed Dec 07, 2005 1:32 am

bennys wrote:
Code: Select all
A={R5C1,R5C6} B={R4C1} x=7 z=1

You may wish to correct the row numbers.

by **ronk** » Wed Dec 07, 2005 1:36 am

rubylips wrote:I'll run through the three examples from bennys in order to illustrate the definition I've just posted.

bennys wrote:
Code: Select all
+-------------+-------------+-------------+ | 24 7 8 | 24 6 5 | 1 9 3 | | 9 3 24 | 248 1 48 | 56 7 56 | | 5 1 6 | 7 3 9 | 8 4 2 | +-------------+-------------+-------------+ | 28 9 23 | 458 7 6 | 35 1 45 | |*17 6 5 | 3 9 14 | 2 8 47 | |*178 4 13 | 58 2 *18 | 357 6 9 | +-------------+-------------+-------------+ | 6 5 7 | 1 4 2 | 9 3 8 | | 14 2 14 | 9 8 3 | 67 5 67 | | 3 8 9 | 6 5 7 | 4 2 1 | +-------------+-------------+-------------+ A={R6C1,R6C6} [edit: row numbers corrected] B={R5C1} x=7 z=1

rubylips wrote:III. Straightforward

Code: Select all
1. Consider the cell r6c1. When it contains the value 8, the values 1 and 5 in Row 6 must occupy the cells r6c4 and r6c6 in some order. When it contains the value 7, the value 1 in Box 4 must occupy the cell r5c1. Whichever value it contains, the cells r6c2 and r6c3 cannot contain the value 1. - The move r6c3:=1 has been eliminated. The value 3 is the only candidate for the cell r6c3.

What sets did your solver identify to produce the above log? It appears to be ...
A={R6C1,R6C4,R6C6}
B={R5C1}
... which means your A is larger than bennys' illustration.

by **bennys** » Wed Dec 07, 2005 6:30 am

ronk wrote:

You may wish to correct the row numbers.

Thanks I fixed it

by **rubylips** » Wed Dec 07, 2005 11:11 am

ronk wrote:What sets did your solver identify to produce the above log? It appears to be ...
A={R6C1,R6C4,R6C6}
B={R5C1}
... which means your A is larger than bennys' illustration.

The potential Disjoint Subset in Row 4 that the solver has identified takes the values {1,5,8} and occupies the cells {r4c1,r4c4,r4c6}. It so happens that when r6c1 takes the value 8, it is necessary only to look at the cell r6c6, which is what bennys does, in order to lock the 1. However, my solver reports the most general form of the construction.

by **ronk** » Wed Dec 07, 2005 1:52 pm

rubylips wrote:
ronk wrote:What sets did your solver identify to produce the above log? It appears to be ...
A={R6C1,R6C4,R6C6}
B={R5C1}
... which means your A is larger than bennys' illustration.

The potential Disjoint Subset in Row 4 (sic) that the solver has identified takes the values {1,5,8} (sic) and occupies the cells {r4c1,r4c4,r4c6} (sic).

Sorry, I mislead you by modifying only bennys set A and commenting only about your set A, as I wanted you to identify both your sets A and B ... so let me rephrase the prior post.

What sets did your solver identify to produce the above log? It appears to be ...
A={R6C1,R6C4,R6C6}
B={R5C1,R6C1}
... which means both your A and B are larger than bennys' illustration.

It so happens that when r6c1 takes the value 8, it is necessary only to look at the cell r6c6, which is what bennys does, in order to lock the 1. However, my solver reports the most general form of the construction.

"It so happens" ... and "the most general form" ... both seem like politicians phrases. I'm guessing both your sets A and B are different than bennys, because your implementation of Conditional Disjoint Subsets requires a non-trivial intersection of cell sets A and B.

Consider the intersection between a Line (i.e. a Row or Column) and a Box. Suppose that the Line contains an Almost Disjoint Subset L, i.e. a subset of nL cells that between them contain nL+1 values (call them VL) and that the Box contains a subset B of nB cells (call them VB) that between them contain nB+1 values. Furthermore, suppose that the subsets B and L have some non-trivial intersection I, with candidates VI.

But that still doesn't explain why your B is larger than the minimum. With sets tagged differently, here again is bennys' puzzle ...

Code: Select all: +-------------+-------------+-------------+ | 24 7 8 | 24 6 5 | 1 9 3 | | 9 3 24 | 248 1 48 | 56 7 56 | | 5 1 6 | 7 3 9 | 8 4 2 | +-------------+-------------+-------------+ | 28 9 23 | 458 7 6 | 35 1 45 | |*17 6 5 | 3 9 14 | 2 8 47 | |&178 4 13 | 58 2 *18 | 357 6 9 | +-------------+-------------+-------------+ | 6 5 7 | 1 4 2 | 9 3 8 | | 14 2 14 | 9 8 3 | 67 5 67 | | 3 8 9 | 6 5 7 | 4 2 1 | +-------------+-------------+-------------+ A: {R6C1,R6C6} = {178} B: {R5C1,R6C1} = {178} x=7 z=1

... where '&' indicates the intersection cell common to A and B. The "general form" is satisfied AFAICS.

If r6c1=1, then r6c3<>1
else if r6c1=7, then r5c1=1, r6c3<>1
else if r6c1=8, then r6c6=1, r6c3<>1

rubylips wrote:1. Consider the cell r6c1.
...........
When it contains the value 7, the value 1 in Box 4 must occupy the cell r5c1.
.

If the argument is good enough for the 17 in r5c1, why isn't it good enough for the 18 in r6c6? The log would then have a statement like ...

1. Consider the cell r6c1.
When it contains the value 7, the value 1 in Box 4 must occupy the cell r5c1.
When it contains the value 8, the value 1 in Row 6 must occupy the cell r6c6.

Of course this is just conjecture, as I'm guessing at your set B now.

by **rubylips** » Wed Dec 07, 2005 7:08 pm

The statements that appear in the solver log are sufficient to justify the final elimination - but not always necessary. Personally, I'm more interested in the sufficiency condition but one day when I've nothing better to do I might tighten the necessity conditions.

The New Sudoku Players' Forum

Almost locked sets xz rule

Re: Almost locked sets xz rule