## Idea for computer algorithm

Advanced methods and approaches for solving Sudoku puzzles

### Idea for computer algorithm

Here's my idea: in each cell, the probability that each number goes there is stored, and then that value is eliminated from the other cells in the same row, column, and box with that probability. For example, if there was a 25% (0.25) chance that r2c3 was a 5, then the probabilities that 5 was in the rest of the cells in row 2, column 3, and the upper-left box would be reduced by 25%.

I think that if this was cycled repeatedly, it would start to show a portrait of where the values should go. Before I continue work, though, I have a few issues that I'd like to discuss with more experienced Sudoku players and programmers:

1) Does anybody see any obvious problems with this idea? (read: does this have any chance of working?)

2) How should I calculate the probability of a certain value being in a certain cell? I was thinking something like 1/([# of candidates in that cell] * [# of possibilities for that value in the same box] * [# of possibilities...same row] * [# of possibilities...same column]). Remember that by using this system, nothing will be certain - the correct values will just get more and more likely, but never hit 100% (1.0) except for the givens.

3) Referring to the above example, should the probability of a 5 in r2c2 be reduced twice, since it is in both the same box and row? Or should it only be reduced once? My initial thought was that it should be reduced twice, but since r2c2 will cause r2c3's stats to do the same thing, I now think that it should only be reduced once.

4) How does the program know it's done? I was thinking once the "favored" values (the one with the highest percentage) in each cell didn't change after an iteration of this process.

Edit 1: clarified point of discussion #3

Edit 2: A better idea for calculating the probability of a number in a cell is:
Code: Select all
`1 - {(1 - (1/[#of candidates in that cell])) *     (1 - (1/[#of possibilities for that number in the same box])) *     (1 - (1/[#of possibilities for that number in the same row])) *     (1 - (1/[#of possibilities for that number in the same column]))}`

Then, if it was an only candidate or naked single, it would get 100% (1.0) probability for that location.
Last edited by PTP on Mon Aug 07, 2006 4:48 pm, edited 2 times in total.
PTP

Posts: 5
Joined: 02 August 2006

It might be a useful idea for truly gigantic sudoku where the number of calculations gets out of hand.

A real probability could be based on the number of templating rookeries or POM solution patterns that make use of that particular digit in that particular cell. For example, if you have this situation...
Code: Select all
` .   .   . | .   .   . .   1   1 | .   1   . .   1   . | .   1   . ----------+----------- .   1   1 | .   .   .`

and all other 1's are solved, you have 3 possible rookeries or solution patterns marked abc below.
Code: Select all
` .   .   . | .   .   . .   a   b | .   c   . .   c   . | .   ab  . ----------+----------- .   b   ac| .   .   .`

Cells containing one letter would have a 1/3 chance of being the '1'. Cells containing two letters would have a 2/3 chance.

Cells containing a high percentage chance for multiple numbers, or low chances for few numbers, might be likely candidates for magic cells.
Myth Jellies

Posts: 593
Joined: 19 September 2005

I appreciate that a probabilistic approach is addressed here again.
But i fear, that there is no experience with it so far. It would be very interesting for manual solvers also, if they would have a good chance to solve a hard puzzle with some kind of educated guessing.
And MJ's observation is a good starter, i think
ravel

Posts: 998
Joined: 21 February 2006

When i looked, which cell for each number has the highest probability according to your formula (Edit 2) in an arbitrary candidate grid (after basic methods), 2 of them were right, 6 wrong and 2 of 3 cells with equal value for one number right. So about 2/3 of placements based on that would be wrong in this case.
There are slightly more than 3 candidates per cell in average, i.e. pure random selection would give you a 1:3 chance (for each cell) also.
So for this grid the formula in the first iteration brings nothing.
I dont think, that the suggested iterations will make it much better (doesnt the cell with the highest probability always remain at the end?)
ravel

Posts: 998
Joined: 21 February 2006

Wow, what was I thinking, of course that won't work.

Okay, here's how it works: If a cell is one of the givens, starts with a score of 1 for that value and 0 for all others. All blank starting cells start with an arbitrary starting value for each candidate (say, 0.5 - but anything between 0 and 1, non-inclusive, will work). It then goes through each cell and multiplies the scores of every cell that it "sees" by (1 - [its corresponding scores]).

I'll run it through this cycle a few times with a known grid and see what happens.
PTP

Posts: 5
Joined: 02 August 2006

MJ's suggestion that Templates/Rookeries be used seems reasonable to me. However, I don't agree with his probabilities. Here's how I would proceed.

Example:

Cell [r5c5] has three candidates -- <479>.

There are 20 rookeries for <4> and 16 of them contain cell [r5c5]. Set X = 16/20.
There are 80 rookeries for <7> and 40 of them contain cell [r5c5]. Set Y = 40/80.
There are 60 rookeries for <9> and 45 of them contain cell [r5c5]. Set Z = 45/60.

The probability of [r5c5]=4 is X / ( X + Y + Z ).
The probability of [r5c5]=7 is Y / ( X + Y + Z ).
The probability of [r5c5]=9 is Z / ( X + Y + Z ).

If you perform this logic for all cells, then you can use the highest overall probability to make a cell and value assignment.

I don't even want to think about the nightmares you might encounter if you need to backtrack because of an improper selection.
daj95376
2014 Supporter

Posts: 2624
Joined: 15 May 2006