## Su-Doku's maths

Everything about Sudoku that doesn't fit in one of the other sections

### Re: pseudo code

Dominic wrote:"map" is an 81-element array of LONGs. Each element represents one cell and has placed within it 3 bits which indicate the block, row and column to which the cell belongs. - I only need 3*9 bits (i.e. <32) to do this. This is a "static" array.

I get it now, thank you. This means your original program was indeed correct - I should have read your comment in it more thoroughly.

I implemented your approach in C but I get only about 100,000 solutions per second, nowhere near the several million per second you claim. What initial placement of the numbers did you use? I assumed you started the search with an empty grid. Thus, the inner loop of my C version of solve(cell) looks like this:
Code: Select all
`    for (i=0; i<9; i++)        if (!(num[i] & grid[cell])) {            num[i] += grid[cell];            solve(cell+1);            num[i] -= grid[cell];        }`

This loop has 9 iterations on all search levels and contains a hard to predict conditional jump in its body. In contrast to this, storing masks for the digits like I did allows to save quite a few iterations of the inner loop and a few conditional jumps as well, like this:
Code: Select all
`    mask = mask_row[y] & mask_column[x] & mask_box[box];    while (mask) {        bit = mask & -mask; /* extract lowest bit from mask */        mask -= bit;        <recurse>    }`

Thank you for sharing your idea, it was a way to store the necessary data that had not occured to me. This could come in handy some day, even if it doesn't work very well (as I believe) for this problem.

Bertram
Bertram

Posts: 4
Joined: 07 June 2005

Deleted
Last edited by coloin on Sun Jul 24, 2005 2:05 pm, edited 2 times in total.
coloin

Posts: 1773
Joined: 05 May 2005

### How many sudokus?

After reading all the postings about how many sudokus there are, and doing some analysis of my own, I have decided to send in this email with some information that I hope will help clarify peoples thinking.

There are 2 questions that need to be answered. How many sudokus are there? And how many unique sudokus are there?

Two sudokus are unique if one cannot be transformed into the other. Here are some transformations to consider.

1 Rotations: Rotate the grid about the center square. That is 4 transformations.

2 Reflections: Reflect in a mirror. 2 transformations.
(Note: It is impossible of a sudoku to be symmetrical. Proof: To be symmetrical, an axis of symmetry must pass through box 5, and the same numbers would have to be on both sides of the axis in box 5.)

3 Change numbers: Change every occurrence of a digit for another digit. At least 2 digits will need change with this transformation. Consider these 2 sudokus:

786 153 492 157 432 689
593 724 816 382 196 547
124 869 573 496 578 312

835 941 627 523 864 791
942 675 138 869 713 425
671 382 945 714 259 863

219 536 784 948 327 156
368 497 251 275 681 934
457 218 369 631 945 278

They may look different to start with, but take a closer look. 1 in the left sudoku is changed to 4 in the right sudoku, 2 to 9, 3 to 2, 4 to 6, 5 to 3, 6 to 7, 7 to 1, 8 to 5, 9 to 8. That gives 9! (362880) possible transformations.

4 Switch Columns/Rows: Rearrange the columns in a column of boxes and/or rearrange the rows in a row of boxes. That is 6 transformations per column or row of boxes for a total of 6^6 (46656) transformations.

5 Switch Columns/Rows of Boxes: Rearrange the order of the columns/rows of boxes. 6x6 or 36 transformations

At first I just considered the first three to be valid, but after some thought decided that they were primary transformations and 4 and 5 were secondary transformations, because they are equivalent to tearing up the grid. I would be interested in what other people think on that one. The first three give a total of 2903040 transformations from 1 sudoku. The last two give a total of 6^8 (1679616) transformations for a total of 4,875,992,432,640 transformations.

So the total number of sudokus = unique sudokus x number of transformation.

Now onto how many sudokus are there. I haven't worked out the whole answer yet, but have some things that need consideration.

I will represent the first row as: XXX YYY ZZZ, where the order of the digits in each triplet can be ignored for the moment. The important thing to remember is that as soon as a number is placed in 2 of the boxes, the box for the third digit is immediately decided.

The second row depends on how many digits are common between the first triplet and YYY. There are four possibilities.

First no digits in common. That means the first triplet is ZZZ. Filling in the rest of rows 2 and three gives:

XXX YYY ZZZ 987 654 321
ZZZ XXX YYY 321 321 321
YYY ZZZ XXX 321 321 321 or 16,930,529,280

The probabilities for that grid are on the right.

Now for all three digits in common. Filling in rows 2 and 3 gives:

XXX YYY ZZZ
YYY ZZZ XXX
ZZZ XXX YYY

This has the same probability as above. This also shows that there is only 3 possible choices of digits in the center, not 6 as most people seem to think.

1 or 2 digits in common is slightly harder. First 1 digit in common. I will represent the digits in common with the letters Q, R and S. Note: To make it a bit easier to read the next grid I have written those letters last. Q, R and S could just as easily represent the first or second digits in the triplet.

XXQ YYR ZZS 987 654 321
ZZR XXS YYQ 932 332 321
YYS ZZQ XXR 321 321 321 or 457,124,290,560 or 27 times the above probabilities.

The probabilities for that grid are on the right.

Finally 2 digits in common.

XXQ YYR ZZS
YYS ZZQ XXR
ZZR XXS YYQ

That also has the same probabilities as 1 digit in common

The total probability is equal to the sum of the four probabilities. (56 x 16,930,529,280 or 948,109,639,680)

Ps I wondered if it was possible for a sudoku to have a row or column of boxes with the triples staying together, till I came up with the following sudoku.

123 789 456
456 123 789
789 456 123

312 978 645
645 312 978
978 645 312

231 897 564
564 231 897
897 564 231
Condor

Posts: 62
Joined: 19 June 2005

### Transformations

Interestingly - it appears some sudukus have more similarities than others ! [44 we are told]
It is to do with
1 some transformations + reflection etc giving the same result as a rotation - it is not constant.
2* the sudukus like repeating ones which might [?] have even more than your average grid.

I would like to see an example of the gang of 44 -[Bertram]

Maybe those grids which can be narrowed down to 17 numbers form the minimum grids.And those which cant be narrowed down to much higher numbers are in 2*.
coloin

Posts: 1773
Joined: 05 May 2005

Bertram and I have finished the first version of our article explaining how we counted Sudoku grids, and I've made a web page with the article, all of Bertram's programs and results, and Red Ed's program which verified the result independently (thanks to Red Ed for allowing us to put this on the page). The page is at http://www.shef.ac.uk/~pm1afj/sudoku/.

Frazer
frazer

Posts: 46
Joined: 06 June 2005

In reply to coloin's last message (posted while I was writing my message!), the article explains how we reduced the computation down to just 71 brute force searches and a little about how Red Ed got it down to 44. The 44 searches (and results) are given at the start of Red Ed's program. It's perhaps worth saying that Ed stores the columns of blocks 2 and 3 in numerical order, so he stores a configuration like

123 478 569
456 139 278
789 256 134

as 124/357/689/125/367/489, the entries in the columns of blocks 2 and 3 written in order. (This way of storing the columns means that Ed finds all possible "duplication" equivalences in the terminology of the paper, not only 2xk or kx2, but any possible higher ones too; this presumably accounts for why Ed has to make just 44 searches, compared with Bertram's 71.)

Frazer
frazer

Posts: 46
Joined: 06 June 2005

Perhaps coloin and others are a little confused about the 44 equivalence classes. We're not saying that there are essentially only 44 different sudoku grids: we're saying that there are essentially only 44 ways of filling in the top band of 3 rows. It takes about 2 minutes to count the number of ways of extending from 3 to all 9 rows filled in, so you can understand why it paid to find a way of doing so only 44 (rather than 36288, the "obvious" way) times.
Red Ed

Posts: 633
Joined: 06 June 2005

### 44

Thankyou - yes I was confused.
Your work reconfirms the enormity of the problem we face !

Red Ed's nomenclature is clever and was worth getting to grips with - it took me a long time to work out why it had to start with 124!

So we have 44 inital 3-box "grids" with varying numbers of transformed/equivilent members from 4 to 3240 each with around 100 million solutions [Each a different number !].

44 : 147 258 369 147 258 369 : 4 : 108374976

represents :
[123] 789 456
[456] 123 789
[789] 456 123

The most symetrical 3 box grid had the least number of equivilents [4] [Ok I see them now] but the most number of total solutions.

Where now ?
coloin

Posts: 1773
Joined: 05 May 2005

### What am I missing?

I did some work recently to enumerate sudoku solutions. Here's my idea of how to do it which seems to contradict an awful lot of the messages in this thread. Just wondered if anyone here could briefly explain what bad assumptions I've made please?

Thanks, sub1ime_uk

------------------------------------------------------------------------------------

Thought I'd offer a method for solving all possible 9x9 sudoku puzzles
in one go. It'll takes a bit of time to run however (and 9x9 seems to
be about as big as is reasonably possible before combinatorial
explosion completely scuppers this type of program)...

Basic idea:-

123456789
234567891
345678912
456789123
567891234
678912345
789123456
891234567
912345678

Note that all rows and columns contain all 9 digits (but that the
sub-tiles aren't correct for a sudoku solution).

Next write a program which permutes all 9 columns, and then for each of
those permutations permutes the last 8 rows. This will, I believe,
generate all possible grids with no digit repetitions in any row or
column. It will generate 14,631,321,600 (9!*8!) possible sudoku
candidates. Finally, check each candidate to see if any 3x3 subtile has
a repeated digit in it and discard as soon as a repeat is found (sooner
the better). Any that come through unscathed get written as a 82 (81 +
lf) char string to an output file.

I've written a short program (in Python; see below) that tries out this
idea. Indications are that my HP nc6000 laptop can check around 30,000
candidates/sec and that c.0.15% of them are valid sudoku solutions.
That means it would take around 6.5 days to generate the between 20-30
million possible sudoku solutions it will find. That would require
around 2GB in uncompressed disk storage. Gzip does a VERY good job of
compressing files containing this type of data -- so I'd expect well
over 80% compression (might even fit on a CD then).

Once you've generated the solution data then comes the fun of searching
it efficiently which I leave as an exercise for the reader :-)

Regards, sub1ime_uk (at) yahoo (dot) com

==============================­==============================­======
Code: Select all
`#!python # # sudoku.py - generate all valid sudoku solutions # # Usage: sudoku <N> <S> #    eg: sudoku 9 3 # Whare:- #        N is the grid size (ie 9 for 9x9) #        S is the sub-tile size (3 for 3x3) # # (c) 2005 sub1ime_uk (at) yahoo (dot) com # import sys from gzip import GzipFile from time import time def permute(s):   if len(s)==0: return   if len(s)==1:     yield s     return   for i in range(len(s)):     for t in permute(s[:i]+s[i+1:]):       yield s[i:i+1]+t   return def populate(sz, ini):   tile = []   for i in range(sz):     tile.append([])     for j in range(sz):       x = chr((i+j)%sz+ord(ini))       tile[i].append(x)   return tile def subtilesok(t, c, r, n, s):   for x in range(0, n, s):     for y in range(0, n, s):       d = {}       for i in range(s):         cn = c[x+i]         for j in range(s):           rn = r[y+j]           d[t[cn][rn]] = 1       if len(d.keys())!=n: return 0   return 1 def displaytile(t, c, r, lf):   lfstr=''   print   for i in r:     row = []     for j in c:       row.append(t[j][i])     r=''.join(row)     lfstr += r     print "    ",r   print   lf.write(lfstr+"\n") def fact(n):   if n<2: return 1   return n*fact(n-1) if __name__ == '__main__':   st = time()   logf = GzipFile("c:\\temp\\sudoku.gz"­, "w")   N=int(sys.argv[1])   S=int(sys.argv[2])   estimate = fact(N)*fact(N-1)   if N!=S*S:     print "Subtile size", S, "not sqrt of tile size", N     sys.exit(1)   cols = [x for x in range(N)]   rows = [x for x in range(1, N)]   primarytile = populate(N, '1')   count = 0   answc = 0   for colp in permute(cols):     for rowp in permute(rows):       count += 1       if subtilesok(primarytile, colp, [0]+rowp, N, S):         answc += 1         ct = time()         et=ct-st         if et>0.0:           print "Found: %d out of %d (%.2f%%) checked" % (answc, count, (answc*100./count))           print "Progress: %.2f%%" % ((count*100./estimate))           print "Elapsed time: %.2f secs, checked: %d/s, found %d/s." % (et, (count/et), (answc/et))           print "Estimate time to go: %.2f hours" % ((estimate-count)*et/(count*36­00.))         else:           print "%d / %d (%5.2f%%)" % (answc, count, (answc*100./count))         displaytile(primarytile, colp, [0]+rowp, logf)   print   print "Checked", count,"tiles. Found", answc,"answers."   logf.close()   sys.exit() `

============================================================­=======
sub1ime_uk

Posts: 2
Joined: 21 June 2005

Don't worry -- just read up on Latin Squares and can see what my wrong assumption was. Looks like I was being extremely optimistic after all.

Hope no-one wasted too much time wondering what I was missing.

Thanks anyway, sub1ime_uk
sub1ime_uk

Posts: 2
Joined: 21 June 2005

coloin wrote:Where now ?

You mean, how do you count the number of completions (rows 4-9) given the contents of rows 1-3? In brief (in case this wasn't what you wanted to know): you just brute-force look for solutions such that x4<x5<x6, x7<x8<x9 and x4<x7, where x* = contents of the first column. Then multiply the answer by 72. I think Bertram and I both use this tactic, but we differ in the implementation details of the brute-forcer.
Red Ed

Posts: 633
Joined: 06 June 2005

My solution is 4,5*10^20
Here is the Proof:

The center block (block5) can be populated in 9! ways without constraints.

Now column 4 has 6 unpopulated positions without constraints giving 6! ways.

Column 5 is restricted by column 4, so that it can be populated in 5!ways

In a similar way column 6 can be ppulated in 4! ways.

Rows 4 to 6 will be populated in the same way as columns 4 to 6

Now we have 9!*6!*6!*5!*5!*4!*4! wihout the corner squares

Now we want to place the remaing 6 numbers in column 1. Each of these numbers will occur exactly twice, one time in block2 and one time in block8. This leaves 4! ways to populate column1. Now one position is taken leaving 3! ways to populate column2 and 2(!) ways to populate column3.

All remaining numbers have their given places.

This gives a total of 9!*6!*6!*5!*5!*4!*4*4!*3!*2! =
449371462592102000000

Suggestions? Errors?
nilton

Posts: 1
Joined: 21 June 2005

nilton wrote:My solution is 4,5*10^20

Your solution is a little on the small side of the generally accepted figure.
scrose

Posts: 322
Joined: 31 May 2005

### Where now ?

Where now?

Sorry Red Ed - it was rhetorical - but thanks.

Whilst you are there - your list of 44 is a long list of very big [even] numbers.
I assume there are prime numbers somewhere - yes
44. 108374976 = 1063 [prime]*59[prime]*2^6*3^3
22. 98153104 = 51551*17*7 [all prime] *2^4

No mileage here...yet

Regards
coloin

Posts: 1773
Joined: 05 May 2005

coloin writes
Interestingly - it appears some sudukus have more similarities than others ! [44 we are told]
It is to do with
1 some transformations + reflection etc giving the same result as a rotation - it is not constant.
2* the sudukus like repeating ones which might [?] have even more than your average grid.

I would like to see an example of the gang of 44 -[Bertram]

What do you mean 'it is not constant'. Can you please give an example. Because all sudokus are asymmetrical, they would all produce 4 rotations. Plus 2 reflections and 362880 number changes. So I don't see how it could be any thing other than constant.

Ps I would also like to see the 'gang of 44'
Condor

Posts: 62
Joined: 19 June 2005

PreviousNext