## Standardizing the Uniqueness descriptions...

Advanced methods and approaches for solving Sudoku puzzles

12|123
12|124

can be thought of as

12|34
12|1234

Where does the extra 3 come from?

stuartn
stuartn

Posts: 211
Joined: 18 June 2005

stuartn wrote:Where does the extra 3 come from?

As I understand it from Lummox's description, when looking for hidden pairs, you can think of the 2 "floor" squares as having the following values:

* the union of all their values (123 union 124 = 1234)

* the union minus the values in the roof squares (1234 minus 12 = 34)

Note that you cannot assign them these values; you merely pretend they are such in order to find the hidden sets.

If I've got this wrong, then hopefully Lummox will post a correction.

Posts: 81
Joined: 10 June 2005

MadOverlord wrote:If I am thinking along the right lines here, then important insight here is: to avoid the deadly pattern, if a candidate hidden set exists, all the 12's have to be concentrated on a single square. It is the hidden set itself that forces this restriction (otherwise we get n possibilities in n+1 squares).

I'm not sure I follow there, but basically I see it like this: The uniqueness test itself eliminates 12 from either one cell or the other. The remaining question is whether 12 appears in either cell at all. Finding a partial naked set with 345 or partial hidden set with 12 is one way to prove that one of the cells has 12 as candidates (vs. neither cell having 12). If you don't find a set, you can't rule out that neither c nor d has 12. The subset if it exists proves one of those cells must be 12, because otherwise you'd have n candidates that fit only n-1 cells (hidden), or n+1 cells with only n candidates (naked).

And partial hidden sets of course can be bigger than a pair. In those examples pairs came up a lot, presumably because they're a lot more common. However if you have 6 unknown cells in a house and can find a partial naked triple 345, a partial hidden triple 126 would also exist (assuming of course 6 is the last remaining digit). Two cells in the house would have a subset of 345 as candidates; two would have a subset of 12, a 6, and possibly other candidates. Two would be the c and d cells from the unique rectangle, and they would not contain 6.
So, if the hidden set containing 12 exists, then the possibilities are:

12|34
12|1234

or

12|1234
12|34

(not that we care at that point), but if we cannot find a hidden set, then we cannot make this inference.

Actually we can make this inference regardless, and it's the same way we'd find a partial naked set. This set of possibilities rules out that 12 can appear in both cells at the same time, but allows the possibility that it may still appear (or not) in just one of them.
From the standpoint of a solver, the 34:1234 assumption may be easier (to permit code-reuse), but I am wondering if, from the standpoint of a human solver looking for the pattern, just assuming a <12> quantum square might not be a simpler conceit. I shall think on this later; have to schlep the kids to the indoctrination center now.

I suspect as a human solver you'd want to concentrate on either the naked set with 34, or the hidden set with 12+. You can do either by pretending one of the cd cells is 34. In fact, you can still do that in code anyway; it's not necessary for the quantum cell with the 12 to contain all of the other candidates as long as it has some of them. For example, if you have:
Code: Select all
`12|12312|124`

...then these are both perfectly valid temporary assignments to look for the subsets:
Code: Select all
`12|3412|12412|12312|34`

This will not impact the search for naked or hidden subsets with the correct digits. The only necessary rule is that one of those cells must be set (for the purposes of finding subsets only) to the union of the extra digits.
Lummox JR

Posts: 125
Joined: 22 September 2005

Still bemused - although I fully undersatnd how we can use the uniqueness strategy to solve ( see my posts from way way back) - where does the 3 come from? - one minute it's not a candidate and the next it is....?????
stuartn

Posts: 211
Joined: 18 June 2005

The 3 is not a magical addition; it's already there in one of the cells. You're not actually adding to or subtracting from the candidates of a specific cell because you can't narrow down which one it might be. In the quantum analysis you can't assume you have one scenario or the other, so you have to go with a mixture of both. This is why the quantum candidates involve a union between two cells.

Think of the two cells as being on a centrifuge, spinning very fast. That's sort of how the quantum anaylsis works. We do know that at least one of them cannot contain 1 or 2, so we can use that knowledge to find subsets.
Lummox JR

Posts: 125
Joined: 22 September 2005

Here's a puzzle that aptly demonstrates uniqueness 3 with partial triples, and actually relies on it for a quicker solution. If you use it you can crack the puzzle after an XY-wing; if you don't, you have to resort to something more difficult. This puzzle is one generated on a symmetrical run (a new feature) in my solver.
Code: Select all
`. 4 1|2 . .|. . .5 7 6|. . .|4 1 .. . .|. . 4|. . .-----------------. 6 .|5 2 .|1 . .1 . .|. 8 .|. . 5. . 5|. 7 3|. 6 .-----------------. . .|6 . .|. . .. 2 3|. . .|7 4 8. . .|. . 2|5 9 .`

And here's where the uniqueness test comes in. Note the 47's in box 7, and the candidates in box 4.
Code: Select all
`+----------------+----------------+----------------+| 39   4    1    | 2    6    5    | 389  378  379  | | 5    7    6    | 3    9    8    | 4    1    2    | | 2389 389  289  | 7    1    4    | 6    5    39   | +----------------+----------------+----------------+| 3478 6    478  | 5    2    9    | 1    378  347  | | 1    39   279  | 4    8    6    | 239  237  5    | | 2489 89   5    | 1    7    3    | 289  6    49   | +----------------+----------------+----------------+| 89   5    89   | 6    4    7    | 23   23   1    | | 6    2    3    | 9    5    1    | 7    4    8    | | 47   1    47   | 8    3    2    | 5    9    6    | +----------------+----------------+----------------+`

The original candidates are 47; the extras are 38. In box 4, this gives you a naked triple 389 with cells (2,5) and (2,6). You can eliminate 8 and 9 from cells (3,5) and (1,6). The naked set exists because we know that either 38 will be exposed in (1,4) or 8 in (3,4).

The complementary hidden triple is in digits 247, in cells (3,5) and (1,6), allowing you to remove other candidates from those cells. As you can see, 2 can go in only those two cells. 4 and 7 can each go in only one of those cells (though both would do) and only one of the (1,4) or (3,4) cells because of the uniqueness requirement.
Lummox JR

Posts: 125
Joined: 22 September 2005

being very basic here Lummox - could you stick to RnCn nomenclature for cell ID please? - it's rather become the norm on this august organ and using a different notation when describing a new (and very clever) technique can be a bit confusing - eapecially for newbies.

stuartn
stuartn

Posts: 211
Joined: 18 June 2005

The original candidates are 47

And can we do anything with the 89 and 23 floors in R7?

stuartn
stuartn

Posts: 211
Joined: 18 June 2005

stuartn wrote:being very basic here Lummox - could you stick to RnCn nomenclature for cell ID please? - it's rather become the norm on this august organ and using a different notation when describing a new (and very clever) technique can be a bit confusing - eapecially for newbies.

I can't bring myself to do it; I loathe the RC notation because, among other faults, it puts the Y first and the X second. The purist in me has no truck with a coordinate system that does that. It might make some sense if we were actually dealing with matrix algebra, but it's just a simple grid.
And can we do anything with the 89 and 23 floors in R7?

Nope. With 89 the only matching pair of cells is up in box 1, and either in box 1 or row 3 it doesn't give us anything to work with. We have the uniqueness cells 2389 and 289, and the other two are 389 and 39. There is no partial naked pair 23; for the same reason there is no partial hidden pair 89. Indeed, 9 is a candidate of all 4 cells. Since 8 is a candidate of 3 cells, we also don't have the opportunity to apply uniqueness 4.

For 23, we end up with the uniqueness cells 239 and 237 in box 6, row 5. Along row 5 the other cells are 39 and 279, neither of which can form a naked 79 or hidden 23. In box 6 the other cells are 378, 347, 289, and 49. Again there's nothing to give us a partial naked pair or triple including 79, or a partial hidden pair or triple with 23.

What's interesting is that for every naked pair in the puzzle we have a unique rectangle, but for 89 and 23 there's just no test (yet) that will help narrow anything down. I suppose if you wanted to go out on a limb you could split each case into separate grids where only one cell of the floor pair has the roof candidates, and then do coloring or bifurcating chains or something and look for results that match among the two cases. Indeed there's one thing I can determine right away by looking at that grid using the unique rectangle 23. If (7,5) is not 23, it must be 9, which means (2,5)=3, and therefore (8,5) cannot be 3. Either way, (8,5)<>3.
Lummox JR

Posts: 125
Joined: 22 September 2005

Lummox JR wrote:I can't bring myself to do it; I loathe the RC notation because, among other faults, it puts the Y first and the X second. The purist in me has no truck with a coordinate system that does that. It might make some sense if we were actually dealing with matrix algebra, but it's just a simple grid.

Alas, the problem with (8,3) is that it is ambiguous. Is it R8C3, or R3C8? Even if you are always consistent, other people may enter (8,3) and mean (3,8).

RxCy is unambiguous, and 1 less character to boot. From a comprehension standpoint (and I have some years experience with visual perception issues given all the subtitling I've done), it's much easier to understand, because of the embedded cues.

If you want to us CyRx, that would be better. But 99.9% of the rest of the english-speaking world will naturally use RxCy, because that is how we naturally handle a matrix -- the most common example being text; the natural way to define a word on a page is line x, word y.

Finally, your position is internally inconsistent, since if you want to use XY graph notation, you would place the origin in the bottom left corner of the sudoku matrix, not the top left.

Hey, I gave up on "locked sets", even though I think it's a better descriptor than "naked sets". Now it's your turn.

Posts: 81
Joined: 10 June 2005

MadOverlord wrote:Alas, the problem with (8,3) is that it is ambiguous. Is it R8C3, or R3C8? Even if you are always consistent, other people may enter (8,3) and mean (3,8).

I kinda get that, but that mistake is basically one of favoring a matrix layout. It just makes no sense for sudoku.
If you want to us CyRx, that would be better. But 99.9% of the rest of the english-speaking world will naturally use RxCy, because that is how we naturally handle a matrix -- the most common example being text; the natural way to define a word on a page is line x, word y.

Wouldn't that be line y, word x? ;)
Finally, your position is internally inconsistent, since if you want to use XY graph notation, you would place the origin in the bottom left corner of the sudoku matrix, not the top left.

But sudoku isn't a matrix. None of the operations of matrix algebra apply. It's merely a grid.

Plus, you're referring to Cartesian coordinates. I have no idea why anyone would try to apply them to sudoku, though, because in standard terminology we label the columns left to right, rows top to bottom, and boxes in left-right reading order. If the rows are top to bottom, clearly it makes no sense to apply a y-coordinate that conflicts with that. I know in Sudoku Susser you mentioned you implemented Cartesian order because some people can be picky, but I don't get what planet those people are even from. R=1/y=9 is about as logical as Congress.
Hey, I gave up on "locked sets", even though I think it's a better descriptor than "naked sets". Now it's your turn.

Well, I expect if I do capitulate at some point, it won't be without a lot of grumbling. You're talking to a guy who asserts that there's a wrong way and a right way to stripe Neapolitan ice cream (wide stripes, because most people buy it for the selection of 3 flavors), and that toilet paper must have the edge facing forward, not back (except in extenuating circumstances like young children or mischievous pets). So I kinda take a "screw 'em anyway" approach to the whole topic, noting that while RC notation may be popular, it still sucks. I suppose if I flesh out my solver into a fully functional program at some point, I'll need to take the road you did and support multiple notations. (Although I won't support Cartesian, because again I have no clue what those people are on about.)

And for the record I'm not averse to ever changing the terms from "naked" and "hidden" to something else, if they work better. I especially don't care for "hidden" because it's not really descriptive; "constrained" would be better, even if it is unwieldy. I never liked "locked" though because it could describe both. Either you're locking candidates in a set of cells or you're locking a set of cells to certain candidates.
Lummox JR

Posts: 125
Joined: 22 September 2005

Lummox JR wrote:I kinda get that, but that mistake is basically one of favoring a matrix layout. It just makes no sense for sudoku.

The problem is, you have no idea what layout a writer is favoring. RxCy notation, on the other hand, is inherently self-decoding, because it directly verbally decodes "Row x, column y".

Think in (x,y) or (y,x) notation all you want, but when you're trying to express yourself to others, it is worth the time to write in such a way as to maximize the comprehension of your readers. The huge majority of them use RxCy notation; the little extra effort it takes you to reword into that notation is balanced against easier comprehension for your hundreds of readers.

That's one reason why I started this standardization thread; I wanted to work out some common nomenclature that would make it easier for people to understand these cool patterns. Thus my introduction of terms like "floor" and "roof" squares, "quantum", etc for the approval of the audience.

So I kinda take a "screw 'em anyway" approach to the whole topic, noting that while RC notation may be popular, it still sucks.

You do yourself a disservice; by making it harder for people to understand what you are saying, you reduce your contribution to the state of the art.

Finally, you're dead wrong about Neapolitan Ice Cream. Neapolitan is one of the devil's flavors. The only way to stripe Neapolitan Ice Cream is not to stripe it at all; buy a quart of vanilla, a quart of strawberry, and a quart of chocolate. Neapolitian requires that the demand for the three flavors be equal; in the real world, this is not so. Frequently some bastard will go in and scoop all the chocolate, leaving a pathetic strawberry/vanilla reduced case. Yuck.

And anyway, the one true flavor is Cookies and Cream, invented at my Alma Mater, Cornell. So there!

Posts: 81
Joined: 10 June 2005

Oh lorks - I really started something there didn't I?... still- nice to know about the ice cream.....

Anyway - this comes from the 'toughest grid' discussion that's under way and Lummox - you may be able to help.... Nick70's toughest grid looks like this;

Code: Select all
` [56]  [56] 2 [38] 9 [348] 1 [348] 7 [179] 386 [2457]  [124]  [245]  [459]  [259] 4 [179]  [17]  [123578]  [2578]  [1238]  [2568]  [35689]  [235689]  [12367]  [124678]  [1347]  [39]  [2678] 5 [2468]  [146789]  [12689]  [2567]  [245678] 9 [278] 1 [268] 3 [45678]  [2568]  [123567]  [125678]  [1357] 4 [2678]  [39]  [2568]  [156789]  [125689]  [123579]  [12579]  [1357]  [12589]  [2568]  [12689]  [68]  [368] 4 [135]  [145]  [1345]  [158]  [4568] 792 [368] 8 [249] 6 [29] 3 [249] 7 [15]  [15] `

Do the 'floor' 15's in R9C89 give us any room for manouvre?

stuartn

(this posting avoided any damage to the environment - a number of electrons were severely inconvenienced however)
stuartn

Posts: 211
Joined: 18 June 2005

Nope, stuartn, you'll notice the only cells with candidates forming a unique rectangle with the 15 have way too many candidates, and non-matching ones at that. Non-matching limits us to uniqueness 3 or 4 here, and neither 1 nor 5 is constrained, ruling out uniqueness 4. The extra candidates between the two cells are 26789, which means you'd need to find a naked quintuple or sextuple, or a hidden 15 or 145. Not one of those exists in either box 6 or row 6.

The 56's are actually more promising, but only in the pair from box 4, row 5. Still, nothing comes of it.

There is a swordfish you can do on the 4's at this point, though, and then coloring on the 6's and 7's yields eliminations.

I've actually seen this grid before. Jeff posted it once in a thread about advanced coloring, after the swordfish on the 4's but before the 6 and 7 eliminations. I'm curious--what is the original puzzle?

This one's actually quite difficult, as advanced coloring alone can't solve it. I managed to crack it with bifurcating chains, though.
Lummox JR

Posts: 125
Joined: 22 September 2005

I am implementing hidden set removal in the Susser, and ran into something that has me a bit confused. Using one of Lummox's sample puzzles:

Code: Select all
`+----------------+----------------+----------------+| 457  39   39   | 57   2    1    | 6    457  8    | | 157  8    2    | 57   4    6    | 1579 579  3    | | 1457 467  16   | 9    3    8    |*1457*2457 257  | +----------------+----------------+----------------+| 2    346  136  | 8    9    7    |*45  *45   16   | | 478  4679 69   | 1    5    2    | 789  3    679  | | 178  79   5    | 4    6    3    | 2    789  179  | +----------------+----------------+----------------+| 3    25   4    | 6    8    9    | 57   1    257  | | 9    1    8    | 2    7    5    | 3    6    4    | | 6    25   7    | 3    1    4    | 589  2589 259  | +----------------+----------------+----------------+`

Just looking for hidden sets got me this:

* Squares R4C7, R4C8, R3C7 and R3C8 form a Type-3 Unique Rectangle on <45>. Because they share two rows, two columns, and two blocks, if they all had possibilities <45> then the puzzle would have two solutions; you could simply exchange the <4>'s with the <5>'s in those four squares to get the other solution, and their common rows, columns and blocks would still contain one of each value. Since a valid Sudoku can have only one solution, the fact that one of the squares must contains something other than <45> lets us look for hidden sets in block 3 and row 3 that contain those values; we can treat R3C7 and R3C8 as if they were a single square with possibilities <45>. Upon close inspection, it is clear that:

(R3C7|R3C8)<45>, R1C8<457>, R2C7<1579>, R2C8<579> and R3C9<257> form a hidden set on <12457> in block 3. Any extra possibilities they contain can be eliminated.

(R3C7|R3C8)<45>, R3C1<1457>, R3C2<467>, R3C3<16> and R3C9<257> form a hidden set on <12456> in row 3. Any extra possibilities they contain can be eliminated.

But this leads to an invalid puzzle! After some thought, I added in the restriction that the hidden set cannot contain any of the other possibilities in the floor squares (ie: in the above example, <127> are excluded, which eliminates both candidate hidden sets)

Adding this restriction permits the solver to find:

Code: Select all
`+----------------+----------------+----------------+| 457  39   39   | 57   2    1    | 6    457  8    | | 157  8    2    | 57   4    6    | 1579 579  3    | | 1457 467  16   | 9    3    8    | 1457 2457-257  | +----------------+----------------+----------------+| 2    346  136  | 8    9    7    | 45   45   16   | | 478  4679 69   | 1    5    2    | 789  3    679  | | 178  79   5    | 4    6    3    | 2    789  179  | +----------------+----------------+----------------+| 3   *25   4    | 6    8    9    | 57   1   *257  | | 9    1    8    | 2    7    5    | 3    6    4    | | 6   *25   7    | 3    1    4    | 589  2589*259  | +----------------+----------------+----------------+`

Which, IIRC, is the reduction that Lummox indicated. Note that the squares in the hidden set can contain the extra floor possibilities, they just can't be part of the set itself. So we can eliminate <7> from R3C9.

But with this restriction, whenever there is a hidden set, there is a naked set that permits the same reduction. Are there ever any hidden set reductions that do not have a complementary naked set reduction? Or am I still missing something?