The New Sudoku Players' Forum

by **Clouded** » Sun Apr 17, 2022 9:06 am

There are a number of machine learning projects/papers on Sudoku, but they all suffer from the same problem: they're evaluated on exceptionally easy datasets. (Sometimes datasets conssiting mostly of SE 0.0 puzzles.)

I've been training a neural net to solve Sudoku problems, but it's hitting 100% accuracy on all easy datasets. It solves about 50% of patterns game problems, but I want to have a more granular measure that tells me what it can/can't solve. Would it make sense to plot % solved for each different SE rating? That is, show that it solves n% of SE 1.0 problems, m% of SE 1.1 problems, etc..

One difficulty I'm hitting with that idea is that the distribution of difficulties is uneven. So e.g. if you look at the chart at the end, there are very few problems with the 5.1 rating. (I assume this is something to do with the distribution of the complexity rating mentioned at http://forum.enjoysudoku.com/post318934.html#p318934 , but I don't understand that well -- embarrassingly, my net is much better at Sudoku than me.)

I'd be grateful for any thoughts.

by **Hajime** » Sun Apr 17, 2022 5:55 pm

Clouded wrote:One difficulty I'm hitting with that idea is that the distribution of difficulties is uneven

Why is that a problem?
I would be very interested howmany of the 2400 4.0 puzzles you can solve with neural net solving and howmany of the 20 5.1 puzzles. Better in percentage and is there a slope going down?
Just accept the fact that there are many more 4.0 puzzles than 5.1 puzzles. It is a SE fact.

by **Clouded** » Sun Apr 17, 2022 6:06 pm

Hajime wrote:Why is that a problem?

Because the measurements for e.g. 5.1 will be extremely noisy. The net could just be 'lucky/unlucky' on the few problems that are in the mix.

I've just been looking at the possibility of augmenting the PG dataset with other problem sets to shore this up, but I'm nervous about mixing problems from different sources for other reasons.

I think what I may do is use the PG dataset but throw out any rating where there are <100 puzzles. [In practice that's 4.9, 5.1, 5.8, 5.9, 6.0, 6.1, 6.2, 6.7, 7.7, and >= 10.9 .]

BTW, this is how Hodoku difficulty ratings line up with SE (specifically, SukakuExplainer): https://i.imgur.com/dKnne8v.png

(How do I use the IMG tag and have that image inline at a sensible size?)

by **Hajime** » Sun Apr 17, 2022 6:15 pm

Clouded wrote:throw out any rating where there are <100 puzzles.

Keep them in, but color them red (too noisy).
Very interesting...

Hodoku rating is another issue. One problem at a time. (Is that English?

)

by **Clouded** » Sun Apr 17, 2022 6:35 pm

Hodoku rating is another issue.

It was less that I was interested in the Hodoku rating than that I was thinking of using Hodoku to generate puzzles with given SE ratings. (I had datasets of 1,000,000 problems at each of the Hodoku difficulties lying around, so I figured I might as well graph them.)

In general, most resources on the internet seem to have easy problems. E.g. here is Gordon Royle's 17-clue dataset: https://i.imgur.com/xoLGgh0.png

by **Hajime** » Sun Apr 17, 2022 6:54 pm

Champagne has a database of 3M+ puzzles with a SE>10.
Link is here:
http://forum.enjoysudoku.com/the-hardest-sudokus-new-thread-t6539-720.html?hilit=Database#p296155

by **m_b_metcalf** » Sun Apr 17, 2022 6:58 pm

Clouded wrote:I think what I may do is use the PG dataset but throw out any rating where there are <100 puzzles. [In practice that's 4.9, 5.1, 5.8, 5.9, 6.0, 6.1, 6.2, 6.7, 7.7, and >= 10.9 .]

I'm confused about the origin of the data set you used in the histogram. In the Patterns Game, 4.0 has been played ~514 times and 6.7 ~561 times.

Regards,

Mike

by **1to9only** » Sun Apr 17, 2022 7:07 pm

These might be of interest:

http://forum.enjoysudoku.com/post291343.html#p291343
In Patterns Game 384, I generated 10848 unique ER/EP/ED puzzles for this pattern.

http://forum.enjoysudoku.com/post291249.html#p291249
4353 unique ER/EP/ED puzzles submitted in the patterns games 1 to 385 inclusive.

The Patterns Game is now at 460, so the numbers will be higher but I've not produced a new list.

There are also a number of mismatched ratings in the PG puzzles list, I usually ignore these!
https://github.com/1to9only/patterns-game/raw/main/001-452-mismatches.txt

by **Clouded** » Sun Apr 17, 2022 7:08 pm

@Hajime, @1to9only, thanks!

@m_b_metcalf: First, my data is likely out of date; I downloaded the patterns games about 9 months ago and converted them to a suitable format, and haven't tinkered with them since. I have 41364 games, of which the last few are

100020003030405010004000600020030050500702008080050070002000700010206030300080001
800070003030206080002000400090030060200607009080020040001000500020704030300090004
600020008070103060009000300090030080200801003080070050004000500060702030700040006
100020003020304010005000600040010070700402006050030080002000900010209060800040001

Second, I've been using SukakuExplainer to rate the puzzles; unlike any of the other SE derivatives I could find, it will output all intermediate steps. It's possible that its ratings differ a little from the rater you use for the patterns game?

by **m_b_metcalf** » Sun Apr 17, 2022 7:17 pm

Clouded wrote: It's possible that its ratings differ a little from the rater you use for the patterns game?

But the difference between your counts for 4.0 and 6.7 and those of the PG (as examples) is an order of magnitude out. Something's wrong.

Mike

by **1to9only** » Sun Apr 17, 2022 7:57 pm

m_b_metcalf wrote:But the difference between your counts for 4.0 and 6.7 and those of the PG (as examples) is an order of magnitude out. Something's wrong.

If you are using SudokuMonster's SukakuExplainer, (not sure about this) I think the default is the new techniques are enabled.
Skyscrapers are rated 4.0, so a lot of the PG puzzles will end up having a lower rating!!

by **Clouded** » Sun Apr 17, 2022 8:14 pm

I'm using the latest version (1.18.1) from here, if that helps: https://github.com/SudokuMonster/Sukaku ... r/releases . Version notes for v1.17 do indeed say 'Update default solving techniques'.

Edit: it says...

Code: Select all: The techniques are (in this order) 01: 1 Hidden Single 02: 1 Direct Pointing 03: 1 Direct Hidden Pair 04: 1 Naked Single 05: 1 Non-Consecutive Forcing Cell 06: 1 Locked Non-Consecutive 07: 1 Ferz Non-Consecutive Forcing Cell 08: 1 Ferz Locked Non-Consecutive 09: 1 Direct Hidden Triplet 10: 1 Pointing & Claiming 11: 0 Generalized Intersections 12: 1 Naked Pair 13: 0 Generalized Naked Pair 14: 1 X-Wing 15: 1 Hidden Pair 16: 1 Naked Triplet 17: 0 Generalized Naked Triplet 18: 1 Swordfish 19: 1 Hidden Triplet 20: 1 Scraper, Kite, Turbot 21: 1 XY-Wing 22: 1 XYZ-Wing 23: 1 WXYZ-Wing 24: 1 Unique Rectangle / Loop 25: 1 Naked Quad 26: 0 Generalized Naked Quad 27: 1 Jellyfish 28: 1 Hidden Quad 29: 1 3 Strong-linked Fishes 30: 0 Generalized Naked Quintuplet 31: 0 Generalized Naked Sextuplet 32: 1 VWXYZ-Wing 33: 1 Bivalue Universal Grave 34: 1 4 Strong-Linked Fishes 35: 1 Aligned Pair Exclusion 36: 0 5 Strong-Linked Fishes 37: 0 6 Strong-Linked Fishes 38: 1 UVWXYZ-Wing 39: 1 Forcing Chains & Cycles 40: 1 TUVWXYZ-Wing 41: 1 Aligned Triplet Exclusion 42: 1 Nishio Forcing Chains 43: 1 Multiple Forcing Chains 44: 1 Dynamic Forcing Chains 45: 1 Dynamic Forcing Chains (+) 46: 1 Nested Forcing Chains Default --techs=1111111111010111011111111011100111100111111111

Website · by **denis_berthier** » Mon Apr 18, 2022 2:11 am

.
Hi Clouded,
I'm not sure what you are looking for, but here is a large collection of puzzles (almost 6 millions), with their W ratings and the SER ratings of the first 3 millions: https://github.com/denis-berthier/Controlled-bias_Sudoku_generator_and_collection
It is not unbiased - no unbiased collection is known as of now (an unbiased collection could easily be extracted from it but it would be small).
However, the bias is known and it is therefore easy to use it for unbiased computations. Moreover, each sub-collection with a fixed number of clues n is unbiased among the minimal puzzles with n clues.
All this, plus associated statistical results, is explained in detail in my book [PBCS]: http://forum.enjoysudoku.com/pattern-based-constraint-satisfaction-2nd-3rd-eds-t32567.html

by **eleven** » Mon Apr 18, 2022 7:29 pm

Hi Cloud,

years ago i wondered, if it could be interesting to use neural nets for solving sudokus, but - having very little insight in that topic - it did not make much sense for me.
So my first question is, what do you mean with "it can solve a puzzle". Does it just mean, that it spits out a solution or does it show a way, how it can be solved (i mean e.g. solving steps) ?
If you want to know, how good it is generally, only Denis' collection would be interesting, which is the only one which provides really random puzzles. Of course we know, that the majority of random puzzles is just trivial.
If you train it with e.g. the hardest puzzles collection, i assume, that at some point it will be better for them than for say only hard ones, but this depends on, how it does it.
So, whats your goal and what can be the results ?

by **Clouded** » Mon Apr 18, 2022 7:59 pm

@denis_berthier: thank you!

@eleven Big question! Hope you don't mind a somewhat long answer.

I am a researcher in cognition and my interest in artificial neural nets comes from that standpoint -- I want to understand how they mimic/differ from human cognition. I'm using Sudoku as a testbed to analyse that. Accordingly, I set up a net that solves Sudoku like a human would, one move at a time. You feed in a board state and it returns a board state with one more entry filled in.* You then feed that state back into the net and repeat N times, where N is the no. empty squares in the initial puzzle.

*This is a bit of a simplification -- it returns a probability distribution on possible moves for each empty square. I take the highest-probability move and throw away the rest.

Now, the net has to learn everything from first principles -- during training it gets given millions of Sudoku problems, and feedback on whether it makes correct moves or not. So it's not at any point being told what an X-wing is, or even given puzzles that specifically have X-wings in; it just gets faced with Sudoku and has to make progress somehow.

In order to create a 'fair' comparison point for the net, I tackled Sudoku myself the same way -- I solved the 320 problems in the book 'The Original Sudoku' from scratch, without looking up anything about standard solving techniques. [So e.g. I have no idea what an X-wing actually is -- please don't 'spoil' me yet as it will mess up my methodology.] Then I compared the net's step-by-step solutions to my own ones and asked questions like 'did we hit the same "bottlenecks"?' I.e. did we find we the same problem states 'hard'? The answer was yes -- you can see that in this image:

In terms of my goal, I wanted to point out to researchers that it doesn't mean much to say that e.g. a net gets 95% accuracy on the 17-clues problem set. It's just a trivial bar. It would be nice to have an alternative dataset to point people at, and after talking to you all I think that the Patterns Game dataset is a good one for covering the spread of difficulties. I should say though that I'm not currently trying to get the net to solve the hardest problems possible -- I want it to solve hard enough problems that there is some substance to analyse, but beyond that I'm aiming to have it be as small as possible, so that I can try to understand how it's solving problems. Right now it's better at Sudoku than me, which is good enough!

Edit: graagh, the image is huge again. What am I doing wrong?

The New Sudoku Players' Forum

Creating a balanced test set for Sudoku software

Creating a balanced test set for Sudoku software

Re: Creating a balanced test set for Sudoku software

Re: Creating a balanced test set for Sudoku software

Re: Creating a balanced test set for Sudoku software

Re: Creating a balanced test set for Sudoku software

Re: Creating a balanced test set for Sudoku software

Re: Creating a balanced test set for Sudoku software

Re: Creating a balanced test set for Sudoku software

Re: Creating a balanced test set for Sudoku software

Re: Creating a balanced test set for Sudoku software

Re: Creating a balanced test set for Sudoku software

Re: Creating a balanced test set for Sudoku software

Re: Creating a balanced test set for Sudoku software