## Subset Counting

Advanced methods and approaches for solving Sudoku puzzles

### Subset Counting

Counting
The current Sudoku solution methods are mainly based on counting and implication chains. The part about implication chains is well understood. Basically it is just propositional calculus. Every basic step in a chain is based on counting, and some known counting arguments can be generalized and simplified a bit.

Every region contains each digit precisely once. This can be used in two directions: On the one hand it means that if some set of cells is covered by N regions, it can contain some digit d at most N times. On the other hand it means that if some set of cells contains all occurrences of some digit d in N pairwise disjoint regions, it must contain the digit d at least N times.

For N=1 the "at most" part leads to weak links, and the "at least" part leads to strong links. Very similar arguments work for N > 1.

Subset Counting example
Myth Jellies pushed me to start a new thread in a follow-up to a post describing such an argument - maybe the example given there is a good illustration. Look at the (partial) situation
Code: Select all
`   .   .   . |   .   .   . |   .   .   .   .   .   . |   .   .  <9>|   .  79 789   .   .   . |   .   .   . |   .   . 789 ------------+-------------+------------   .   .   . |   .   .   . |   .   .   .   .   .   . |   .   .   . |   .   .   .   .   .   . |   .   .   . |   .   .   . ------------+-------------+------------   .   .   . |   .   .  37 |   .   .  79   .   .   . |   .   .   . |   .   .   .   .   .   . |   .   .  39 |   .   .   .`

The goal is to eliminate a possible 9 in cell (2,6).
Consider the set of six cells of which the candidates are shown. The possible digits are 3,7,8,9 with maximum multiplicities 1,2,1,3, respectively.
(Why? All 3s are in a single box, so at most one 3. The same holds for 8. All 7s are either in row 7 or in box 3, so there are at most two 7s in these six cells. Finally, the 9s occur in three boxes, so there cannot be more than three of them.)
Now 1+2+1+3=7 and we have 6 cells. If some outside choice decreases the maximum by 2, then we do not have enough digits to fill the cells, and the outside choice is impossible. But the choice of 9 in cell (2,6) would diminish the maximum number of 9s from 3 to 1 (all 9s that do not see cell (2,6) are in one column), and this shows (2,6)!9.

Remarks
Most reasoning involving locked sets, almost locked sets, Sue de Coq, X-wing, XY-wing, XYZ-wing etc is a special case of this subset counting technique.
Also some arguments commonly thought of as chain arguments can be phrased as counting arguments. In particular, remote pairs and, more generally, bivalue forcing chains are a special case.

The contradiction obtained is that an upper bound (set covered by few regions) contradicts a lower bound (set has at least as many digits as cells). One can also obtain a lower bound by counting for each digit d the number of pairwise disjoint regions of which all occurrences of d are contained in the chosen set.
Now e.g. swordfish, and all ordinary nice loops are a special case.
(Thus, one has the two equivalent views of a nice loop: as a chain of propositions or as a single object.)

(How? Given the chain x-a=b-c=d-e=f-x, consider the set a,b,c,d,e,f. Counting (digit,region) pairs there is a lower bound of 3 for the three regions containing a=b, c=d, e=f. But if x holds, then there is an upper bound of 2 found by considering b-c, d-e. One can also count true propositions and come to the same conclusion.)

Chains are basically linear, but subsets do not have such a restriction. This makes subset counting in principle a more powerful technique.

Of course, finding the right subset to consider is not necessarily easier than finding the right chain.

If the set of cells is large, it can become nontrivial to determine the maximum number of occurrences of some digit in the set. The extreme case is that where one takes all cells that do not see a given cell. If the maximum number of occurrences of d is less than 8, then the given cell does not have d. This is Nishio.
aeb

Posts: 83
Joined: 29 January 2006

It looks like a powerful tool to use.

Would it be possible to demonstrate it on the following example?
Code: Select all
`*-----------------------------------------------* | 57   9   ^45  | 2    6    3   | 8    1    47  | | 6   ^12  ^14  | 7    5    8   | 9   *24   3   | | 27   8    3   | 9    1    4   | 27   6    5   | |---------------+---------------+---------------| | 3    4    2   | 6    9    5   | 1    78   78  | | 18   5    7   | 3    28   12  | 4    9    6   | | 18   6    9   | 4    78   17  | 5    3    2   | |---------------+---------------+---------------| | 4    12  -15  | 8    3    27  | 6   %57   9   | | 9    3    8   | 5    47   6   |%27  %247  1   | | 25   7    6   | 1    24   9   | 3    458  48  | *-----------------------------------------------* Eliminating 5 from r7c3(ALS-XY  A=24 in r2c8   B=1245 in r1c3,r2c2,r2c3   C=2457 in r7c8,r8c7,r8c8   x=2 y=4 z=5)`

Thanx,

Tarek

tarek

Posts: 2908
Joined: 05 January 2006

Tarek,

You give 7 cells which have as possible digits: 1,2,4,5,7

The maximum number of times each could occur in this set of 7 cells is:
1,2,2,2,1, respectively. As 1+2+2+2+1 = 8 and we have 7 cells, any outside choice that decreases the maximum by at least 2 is impossible. Since (7,3)5 would decrease the number of possible 5's in our set to 0 (a decrease of 2), we can conclude that (7,3)!5.
re'born

Posts: 551
Joined: 31 May 2007

This certainly makes it easier for me to see the ALS xy-type of eliminations. Thanks.

Correct me if I am wrong, but you could also consider the following group marked with '%'.
Code: Select all
`*-----------------------------------------------* | 57   9    45  | 2    6    3   | 8    1    47  | | 6    12  %14  | 7    5    8   | 9   -24   3   | | 27   8    3   | 9    1    4   | 27   6    5   | |---------------+---------------+---------------| | 3    4    2   | 6    9    5   | 1    78   78  | | 18   5    7   | 3    28   12  | 4    9    6   | | 18   6    9   | 4    78   17  | 5    3    2   | |---------------+---------------+---------------| | 4    12  %15  | 8    3    27  | 6   %57   9   | | 9    3    8   | 5    47   6   |%27  %247  1   | | 25   7    6   | 1    24   9   | 3    458  48  | *-----------------------------------------------* `

5 cells containing digits 12457 with multiplicities 11211 = 6. The 4 in r2c8 eliminates two multiplicities of 4, so it must be invalid.
Myth Jellies

Posts: 593
Joined: 19 September 2005

Rep'nA, thanx, that's what I had in mind....

MythJellies formation is an ALS-xz rule elimination.............

This is fantastic.

tarek

Posts: 2908
Joined: 05 January 2006

aeb wrote:...and all ordinary nice loops are a special case ... bivalue forcing chains are a special case

For the example given, the following nice loop elimination [r9c5]<>4 breaks the puzzle, with only singles required to clean up.
Code: Select all
`[r9c5]=2=[r9c1]=5=[r1c1]=7=[r1c9]=4=[r9c9]-4-[r9c5] => [r9c5]<>4 ^57  9   45    2   6   3     8   1   ^47  6   12  14    7   5   8     9   24   3  27  8   3     9   1   4     27  6    5  3   4   2     6   9   5     1   78   78  18  5   7     3   28  12    4   9    6  18  6   9     4   78  17    5   3    2  4   12  15    8   3   27    6   57   9  9   3   8     5   47  6     27  247  1 ^25  7   6     1   24  9     3   458 ^48`

Consider the four corner cells with five digits 24578.
The maximum multiplicities are 1,1,1,1,1. A possible placement of 4 at r9c5 does not seem to reduce the multiplicities. Please correct me if I am not understanding this correctly.
Can you also please elaborate, using this bivalue loop as example, on the statements regarding nice loops.
(i.e. bivalue forcing chain and the lower/maximum bounds).
Thanks, flip
flip

Posts: 17
Joined: 08 January 2006

[edit: deleted--dumb idea--got to stop posting when I should be sleeping]
Last edited by Myth Jellies on Sun Mar 12, 2006 8:12 am, edited 1 time in total.
Myth Jellies

Posts: 593
Joined: 19 September 2005

So it makes it easy to absorb all the ALS rules........

n cells
max candidates multplicity of n+1

if all cells are connected using 1 sector only (xz rule)
if all cells are connected using 2 sectors only(xy rule)
if all cells are connected using N sectors only..............(xN rule)........
...........interesing, this literally "cuts corners".... especially as MythJellies mentioned that N=0.

Is it possible to show this on some complex x-cycle elimination (grouped colouring, finned Jellyfish...) ? This was partially discussed here

Tarek
Last edited by tarek on Sun Mar 12, 2006 12:52 pm, edited 1 time in total.

tarek

Posts: 2908
Joined: 05 January 2006

But how does one identify a large single set ... without first identifying a chain of smaller almost-locked-sets within the large set?

And if there is no other practical way to identify the large set, does "the whole become greater than the sum of its parts?"

Ron
ronk
2012 Supporter

Posts: 4764
Joined: 02 November 2005
Location: Southeastern USA

ronk wrote:But how does one identify a large single set ... without first identifying a chain of smaller almost-locked-sets within the large set?

If I'm not mistaken the ALS rules would apply only if you have max multiplicities of (Total cells +1).......
So one way to do it, is to count as you progress....

if max multiplicities are not (Total Cells +1) then although the counting elimination may be found (eliminate any candidate that reduces multiplicities to < Total Cells), it can't be explained by the ALS rules

Tarek

tarek

Posts: 2908
Joined: 05 January 2006

tarek wrote:If I'm not mistaken the ALS rules would apply only if you have max multiplicities of (Total cells +1).......

Using bennys' (and Bob Hanson's) rules, that's true. That's why I chose the phrase "chain of smaller almost-locked-sets."

With a chain of ALSs, I think it's possible to get the total of "mutliplicities" equal to either "total cells" or "total cells +1" ... and that's all I've seen demonstrated anywhere in Players' Forums IIRC.

If one throws almost-almost-locked-sets (AALS) into the mix, I suppose other possibilities exist ... but I'm not aware of a practical example there either.

Ron
ronk
2012 Supporter

Posts: 4764
Joined: 02 November 2005
Location: Southeastern USA

Hi aeb,

this is an interesting POV, but some things are very vague to me. I understand your way of looking at ALS, where you only need one set to make the conclusion. Once the set is found, it is an elegant way for showing, why an elimination can be made safely.
Do you have an example, where you can make an elimination, which cannot be done with an (xz- or xy-) ALS also ?
And are you finding the set "as a whole" or are you putting it together from (2 or more) almost locked sets?

Regarding your remarks, i did not understand, how you can argue an x-wing/swordfish or some chains by means of subset counting (no problem with the others). There is a mix of subsets with digit and region counting, i am not looking through.

So please can you show an x-wing elimination with subset counting and tell me, if a comprehensive forcing chain can be expressed with subset counting. For the latter i took an arbitrary sample:

Code: Select all
` 3 5678  57 | 4  19*  26  | 27*  289  157#17  567   4 | 8  19*  26  |  3    29* 157 2   18   9 | 3   5    7  |  6    18    4  `

r1c9=7 => r1c7=2 => r2c8=9 => r2c5=1 => r1c5=9 => r1c9=1

BTW, flip, i cannot see how the 4 can be eliminated (goes to r9c8) [Edit:]sorry, i did not read it right.
Last edited by ravel on Sun Mar 12, 2006 11:57 am, edited 1 time in total.
ravel

Posts: 998
Joined: 21 February 2006

flip wrote:...
Consider the four corner cells with five digits 24578.
The maximum multiplicities are 1,1,1,1,1. A possible placement of 4 at r9c5 does not seem to reduce the multiplicities. Please correct me if I am not understanding this correctly.
Can you also please elaborate, using this bivalue loop as example, on the statements regarding nice loops.
(i.e. bivalue forcing chain and the lower/maximum bounds).

Is it because the the set actually forms a continuous ring that this problem surfaced ?!

Tarek

tarek

Posts: 2908
Joined: 05 January 2006

tarek wrote:Is it because the the set actually forms a continuous ring that this problem surfaced ?

There is no link between r9c1 and r9c9, but the set (four corner cells) and r9c5 make a nice loop with a discontinuity in the weak label 4, so that the 4 at r9c5 can be eliminated.
For the moment I am waiting for aeb to reply to my earlier post.
Regards flip
flip

Posts: 17
Joined: 08 January 2006

IMHO the problem with flip's chain is, that it is comprehensive too, e.g. for the first step r9c5=2=r9c1=5 it is using the fact that the 2's in row 9 are a conjugated pair. This information is not contained in the set of the 5 cells (because it depends on the rest of row 9).
ravel

Posts: 998
Joined: 21 February 2006

Next