Collection of sudokus

Everything about Sudoku that doesn't fit in one of the other sections

Collection of sudokus

Postby lerxst » Tue Nov 15, 2011 4:09 pm

Hello everyone,

I am looking for a collection of sudokus with different numbers of givens. I am doing some tests related to performance and I need [25-100] sudokus with 17 givens, with 18 givens, with 19 givens... up to 50 (or whatever upper limit I can get) etc. Grid ratings are not important as I am trying to measure a correlation based on number of givens versus solving time. Algorithms used to solve the sudokus as well as different types of symmetries (or no symmetry at all) are not important. As I said, only the number of givens is important to me. Anyone can point me to such a collection ? In the worst case, any idea on how to generate such a collection ? Thank you!
lerxst
 
Posts: 18
Joined: 13 December 2010
Location: Montréal, Québec, Canada

Re: Collection of sudokus

Postby Smythe Dakota » Tue Nov 15, 2011 5:28 pm

One difficulty you'll find is that almost all commercial puzzles (magazines, books) have 180-degree rotational symmetry in the positioning of the givens. This, in turn, means that almost all puzzles will have redundant givens. For example, if there is a given in r2c6, there will also be one in r8c4, even if the latter is redundant. In other words, the number of givens will almost always be larger than it has to be. It won't be a reliable indicator of puzzle difficulty.

Bill Smythe
Smythe Dakota
 
Posts: 534
Joined: 11 February 2006

Re: Collection of sudokus

Postby lerxst » Tue Nov 15, 2011 5:36 pm

Smythe Dakota wrote:One difficulty you'll find is that almost all commercial puzzles (magazines, books) have 180-degree rotational symmetry in the positioning of the givens. This, in turn, means that almost all puzzles will have redundant givens. For example, if there is a given in r2c6, there will also be one in r8c4, even if the latter is redundant. In other words, the number of givens will almost always be larger than it has to be. It won't be a reliable indicator of puzzle difficulty.

Bill Smythe


Thanks Bill. But I am trying to measure a "brute-force" approach and correlate it with the number of givens to predict solving time. Since it's a brute-force solver, the program is not aware of "concepts" like symmetries and solving strategies. It's as dumb as a rock and has no sudoku knowledge besides the sudoku constraints and the givens. I am not trying to rate/grade difficulty of sudoku puzzles, all I'm interested in is a correlation between solving time and the number of givens!
lerxst
 
Posts: 18
Joined: 13 December 2010
Location: Montréal, Québec, Canada

Re: Collection of sudokus

Postby dobrichev » Tue Nov 15, 2011 8:01 pm

lerxst wrote:...I am not trying to rate/grade difficulty of sudoku puzzles, all I'm interested in is a correlation between solving time and the number of givens!


Hi,

This is more complex task than you probably expected. The puzzle difficulty is a measure of the steps required to solve the puzzle, and this affects the solving time drastically more than the number of initial givens. Brute force solving is not an exception. For precise measurements you need of collection of "average" puzzles of different sizes. I have no such collection.

There exist collections of 17, 38, and 39-clue puzzles. You could find them in this forum. You must at least "scramble" these puzzles by performing random series of "validity preserving transformations".

There are several popular collections of biased puzzles, useful for comparison of one solver to another, but this is not what you want.

Cheers,
MD
dobrichev
2016 Supporter
 
Posts: 1311
Joined: 24 May 2010

Re: Collection of sudokus

Postby lerxst » Tue Nov 15, 2011 9:24 pm

dobrichev wrote:
lerxst wrote:...I am not trying to rate/grade difficulty of sudoku puzzles, all I'm interested in is a correlation between solving time and the number of givens!


Hi,

This is more complex task than you probably expected. The puzzle difficulty is a measure of the steps required to solve the puzzle, and this affects the solving time drastically more than the number of initial givens. Brute force solving is not an exception. For precise measurements you need of collection of "average" puzzles of different sizes. I have no such collection.

There exist collections of 17, 38, and 39-clue puzzles. You could find them in this forum. You must at least "scramble" these puzzles by performing random series of "validity preserving transformations".

There are several popular collections of biased puzzles, useful for comparison of one solver to another, but this is not what you want.

Cheers,
MD


I was thinking about measuring time it takes to solve a hundred 17 givens, then a hundred 18 givens, then a hundred 19 givens... up to 40 givens. Based on that fact that those measures would be averaged for every number of givens and that the sudokus would be chosen at random, all I am trying to do is measure the time it takes for my "dumb brute force" algorithm to solve a grid solely based on number of givens. If such a collection sorted by number of givens doesn't exist, what tool would you recommend to generate all those sudokus as quickly as possible ? I would need at least a hundred sudokus per number of givens but the more the better. Of course, I don't mind waiting but I'm not willing to wait 3 months to generate all those! As I said, my algorithm is really dumb and is **only** affected by the number of givens. It's matching vectors and validating constraints against the "known and incomplete" vectors (rows or columns) until all 9 vectors respect all constraints and collectively constitute a valid sudoku grid. So in my case, only the number of givens affects solve time.
lerxst
 
Posts: 18
Joined: 13 December 2010
Location: Montréal, Québec, Canada

Re: Collection of sudokus

Postby eleven » Wed Nov 16, 2011 3:00 pm

Denis Berthier already has done this job in a very serious way. The hard step was to generate real random minimal sudokus. It turned out, that the fast puzzle generators produced puzzles, which had less clues (mostly 24-25 instead of 26.5 on average) and were easier than the real random ones. Denis' generator (which ran for months) produced only a few puzzles with less 20 and more than 30 givens, because they are (relatively) very rare.
He then showed, that in the range i think between 20 and 29 givens the puzzles are the harder on average the more clues they have. (Not really random distributed) Known collections with 17-19 and 30+ clue puzzles also reflected this trend. Denis used 2 ratings (Sudoku Explainer and his own), which correlated well. If you would do the same with a brute force method, the ratings (computation times) would not correlate well, but you should get the same trend.

E.g. suexg is a generator for biased random puzzles. You can find it here. It takes a few hours for 1 mio puzzles, but most will have 23-25 givens, and you will hardly get one with 19 or 30 clues.
eleven
 
Posts: 1535
Joined: 10 February 2008

Re: Collection of sudokus

Postby lerxst » Thu Nov 17, 2011 3:36 am

eleven wrote:Denis Berthier already has done this job in a very serious way. The hard step was to generate real random minimal sudokus. It turned out, that the fast puzzle generators produced puzzles, which had less clues (mostly 24-25 instead of 26.5 on average) and were easier than the real random ones. Denis' generator (which ran for months) produced only a few puzzles with less 20 and more than 30 givens, because they are (relatively) very rare.
He then showed, that in the range i think between 20 and 29 givens the puzzles are the harder on average the more clues they have. (Not really random distributed) Known collections with 17-19 and 30+ clue puzzles also reflected this trend. Denis used 2 ratings (Sudoku Explainer and his own), which correlated well. If you would do the same with a brute force method, the ratings (computation times) would not correlate well, but you should get the same trend.

E.g. suexg is a generator for biased random puzzles. You can find it here. It takes a few hours for 1 mio puzzles, but most will have 23-25 givens, and you will hardly get one with 19 or 30 clues.


Thanks for the info! By any chance, is the test set that Denis generated available somewhere?
lerxst
 
Posts: 18
Joined: 13 December 2010
Location: Montréal, Québec, Canada

Re: Collection of sudokus

Postby eleven » Thu Nov 17, 2011 8:29 am

To my knowledge he did not publish it.
eleven
 
Posts: 1535
Joined: 10 February 2008

Re: Collection of sudokus

Postby m_b_metcalf » Sat Nov 19, 2011 1:44 am

lerxst wrote:I was thinking about measuring time it takes to solve a hundred 17 givens, then a hundred 18 givens, then a hundred 19 givens... up to 40 givens.

Why don't you try:

17: a random set from the gfroyle collection of 45000+ (80% of which yield to singles);

18: the ~100 symmetric puzzles in the Symmetric 18s thread here;

19 - 29: use the Patterns Game results in the Interactive Games forum;

39: a list is available?

That would get you going.

Regards,

Mike Metcalf
User avatar
m_b_metcalf
2017 Supporter
 
Posts: 8290
Joined: 15 May 2006
Location: Berlin

Re: Collection of sudokus

Postby lerxst » Sat Nov 19, 2011 5:00 am

m_b_metcalf wrote:
lerxst wrote:I was thinking about measuring time it takes to solve a hundred 17 givens, then a hundred 18 givens, then a hundred 19 givens... up to 40 givens.

Why don't you try:

17: a random set from the gfroyle collection of 45000+ (80% of which yield to singles);

18: the ~100 symmetric puzzles in the Symmetric 18s thread here;

19 - 29: use the Patterns Game results in the Interactive Games forum;

39: a list is available?

That would get you going.

Regards,

Mike Metcalf


Thanks a lot! I will have a look at that. One more question though. Since I am planning on testing a large amount of puzzles of each category (they are grouped by number of givens), do you know any generator that can generate puzzles with a specific number of givens? I've generated a few million puzzles with QQwing but 18 & 19 givens seem to be very rare...
lerxst
 
Posts: 18
Joined: 13 December 2010
Location: Montréal, Québec, Canada

Postby Pat » Sun Nov 20, 2011 10:11 am

lerxst wrote:I am looking for a collection of sudokus with different numbers of givens

and do you care if the puzzles are minimal or not ?

    e.g. for 39 cells given,
    minimal puzzles are rare ( yes there is a list ),
    but non-minimal puzzles are plentiful.

      and for 19-29, many are indeed available from the Patterns Game --
      but those are all minimal;
      if you wish to allow non-minimal puzzles too,
      don't use that source.
User avatar
Pat
 
Posts: 3423
Joined: 18 July 2005

Re: Collection of sudokus

Postby eleven » Sun Nov 20, 2011 11:43 am

lerxst wrote:I've generated a few million puzzles with QQwing but 18 & 19 givens seem to be very rare...

The fastest way to get a lot of (minimal) puzzles with less than 21 or more than 29 givens is, to take a collection of 17 clue puzzles, e.g. from here and the collection of known 39/38 clue puzzles, which you can find here, and then make a {-1+2} or {-2+1} resp. on them.
E.g, if you want some 1000 18 clues, select randomly 1000 puzzles from the 17 clue set and for each try all combinations to remove a given and add 2 others. Then look, if the resulting puzzle is valid and minimal. When you have enough, do the same to get a lot of 19 clues ...
Same for the high clues, here you would remove 2 givens and add a new one, to get e.g. 37's from a 38.

If you dont want to write it on your own, you e.g. can use gsf's program for the {-n+m} procedure. You can find it here. Probably the command line
sudoku -go"{-1+2}" 17clues.dat
should give you 18-clue puzzles. The program should eliminate eqivalent puzzles, but I dont know, how to eliminate non minimal ones with it - in the high clue range they will be the majority. But i guess, Pat could help you.
eleven
 
Posts: 1535
Joined: 10 February 2008

Postby Pat » Sun Nov 20, 2011 12:48 pm


    if you want only the minimal puzzles,
    in gsf's software add this filter --

      -e1==minimal
    * do you want only the minimal puzzles ??
User avatar
Pat
 
Posts: 3423
Joined: 18 July 2005

Re: Collection of sudokus

Postby lerxst » Sun Nov 20, 2011 8:13 pm

eleven wrote:
lerxst wrote:I've generated a few million puzzles with QQwing but 18 & 19 givens seem to be very rare...

The fastest way to get a lot of (minimal) puzzles with less than 21 or more than 29 givens is, to take a collection of 17 clue puzzles, e.g. from here and the collection of known 39/38 clue puzzles, which you can find here, and then make a {-1+2} or {-2+1} resp. on them.
E.g, if you want some 1000 18 clues, select randomly 1000 puzzles from the 17 clue set and for each try all combinations to remove a given and add 2 others. Then look, if the resulting puzzle is valid and minimal. When you have enough, do the same to get a lot of 19 clues ...
Same for the high clues, here you would remove 2 givens and add a new one, to get e.g. 37's from a 38.

If you dont want to write it on your own, you e.g. can use gsf's program for the {-n+m} procedure. You can find it here. Probably the command line
sudoku -go"{-1+2}" 17clues.dat
should give you 18-clue puzzles. The program should eliminate eqivalent puzzles, but I dont know, how to eliminate non minimal ones with it - in the high clue range they will be the majority. But i guess, Pat could help you.


Yes, I thought about this but I'd like different grids (i.e. puzzle with different solution grids). I don't care about minimal or not as it won't impact my "dumb" solver! But I need puzzles with one solution only!
lerxst
 
Posts: 18
Joined: 13 December 2010
Location: Montréal, Québec, Canada

Re: Collection of sudokus

Postby eleven » Sun Nov 20, 2011 8:43 pm

lerxst wrote:Yes, I thought about this but I'd like different grids (i.e. puzzle with different solution grids). I don't care about minimal or not as it won't impact my "dumb" solver! But I need puzzles with one solution only!

Why shouldn't you get different solution grids ? You have 45000 17 clue puzzles with at least 20000 different sulution grids, where you can start the {-1+2}. Also this operation in most cases will bring you to another grid.
Ah, didn't you understand {-1+2} ? You get the same grid, when you just add a clue of the solution grid, but here you first drop a clue and add 2 different ones (not from the solution grid!) in any 2 empty cells.

If you dont mind, that the puzzles are not minimal, your results will be, eh poor.
In this case ALL puzzles will be very simple on average. As you read, 80% of the 17 clues are singles puzzles and - apart from a few ones -, the others are easy too. If you dont mind redundant clues, this relationship will not change with more clues. It depends on your selection, but probably they will become even easier then.
Now who should be interested, if you can solve 1000 easy puzzles with 30 clues 2% faster than with 17 clues ?
eleven
 
Posts: 1535
Joined: 10 February 2008

Next

Return to General