The New Sudoku Players' Forum

by RW » Tue May 09, 2006 7:55 pm

I was working a puzzle in top1465 today, and suddenly it felt strangely familiar. After tweeking around the puzzle a bit I found that I actually had solved the same puzzle earlier today, only one clue placed differently and scrambled (which didn't affect the way how to solve the puzzle). The puzzles were:

#206

Code: Select all: *-----------* |...|.3.|.7.| |.2.|...|8..| |.1.|...|...| |---+---+---| |3..|.6.|2..| |2..|1..|5..| |7.4|...|...| |---+---+---| |6..|...|.4.| |...|2.5|...| |...|8..|...| *-----------*

"descrambling" (1=1, 2=6, 3=5, 4=7, 5=2, 6=8, 7=3, 8=4, 9=9, rotate 90 and move around boxes and columns) gives:

Code: Select all: *-----------* |4..|...|26.| |.3.|.7.|...| |...|...|...| |---+---+---| |...|.8.|653| |6.1|...|...| |...|...|..7| |---+---+---| |...|6.4|1..| |.5.|...|.8.| |...|2..|...| *-----------*

#163

Code: Select all: *-----------* |4..|...|26.| |.3.|.7.|...| |...|...|...| |---+---+---| |...|.8.|.53| |6.1|...|8..| |...|...|..7| |---+---+---| |...|6.4|1..| |.5.|...|.8.| |...|2..|...| *-----------*

I compared these puzzles in Simple Sudoku. SS used singles, locked candidates, subsets and one XY-wing in both, then got stuck in exactly the same place. (Puzzle can then be solved with a BUG-lite.)

Is this very common in the list? If it is, what's the point in testing your solvers against a list with many puzzles that essentially are the same puzzle? I've had this same feeling many times earlier also, but never taken the time to prove it, so I believe this is not the only case.

just wondering
RW

PS. Sorry if I posted this question in the wrong place, but I haven't been able to find out who made the list to address the question directly.

by **Ocean** » Tue May 09, 2006 9:03 pm

RW wrote:I was working a puzzle in top1465 today, and suddenly it felt strangely familiar. After tweeking around the puzzle a bit I found that I actually had solved the same puzzle earlier today, only one clue placed differently and scrambled (which didn't affect the way how to solve the puzzle). The puzzles were: (...)

The two puzzles you found are actually the same 17-clues puzzle, with one redundant clue added:

Code: Select all: *-----------* |4..|...|26.| |.3.|.7.|...| |...|...|...| |---+---+---| |...|.8.|.53| |6.1|...|...| |...|...|..7| |---+---+---| |...|6.4|1..| |.5.|...|.8.| |...|2..|...| *-----------*

(Scrambling of course never affects the way how to solve a puzzle).

by **tarek** » Tue May 09, 2006 11:58 pm

I checked the grids & it seems they both come from the same canonical grid (gsf's method)....

Code: Select all: 1 2 3 | 4 5 7 | 6 8 9 4 5 6 | 1 8 9 | 7 2 3 7 8 9 | 3 2 6 | 1 5 4 -------+-------+------ 2 1 7 | 9 6 5 | 4 3 8 5 6 8 | 7 3 4 | 2 9 1 9 3 4 | 8 1 2 | 5 7 6 -------+-------+------ 3 7 2 | 6 9 1 | 8 4 5 6 9 5 | 2 4 8 | 3 1 7 8 4 1 | 5 7 3 | 9 6 2

& as Ocean mentioned they both originate from the same puzzle

Code: Select all: . 2 . | 4 . . | . . . . . . | . . . | 7 . . . . 9 | . . . | 1 5 . -------+-------+------ . 1 7 | . . 5 | . . . . . . | . . . | . . . . . . | 8 . . | . . 6 -------+-------+------ . . x | . 9 1 | . . . 6 . . | . . . | . . . 8 4 x | . . . | . . 2

but each with a different redudndant candidate. That makes them essentially different.

tarek

by RW » Wed May 10, 2006 5:33 am

Ocean wrote:The two puzzles you found are actually the same 17-clues puzzle, with one redundant clue added

I found out the same thing when I did some further research. I also found that the situation is even more alarming. Check out Havard's results when he scanned the list for BUG-lites here.

Havard wrote:BUG-Lite+1 (size 6): 18
BUG-Lite+1 (size 7): 1
BUG-Lite+1 (size 8): 4
BUG-Lite+1 (size 9): 1
BUG-Lite+1 (size 12): 15
BUG-Lite+1 (size 16): 1

What's weird with the results? The huge amount of 12 cell BUG-lites. Personally I have done around 40 puzzles on the list and so far found four 12 cell BUG-lites. I "descrambled" all of them to look more like #163:

Code: Select all: #163 *-----------* |4..|...|26.| |.3.|.7.|...| |...|...|...| |---+---+---| |...|.8.|.53| |6.1|...|8..| |...|...|..7| |---+---+---| |...|6.4|1..| |.5.|...|.8.| |...|2..|...| *-----------* #206 *-----------* |4..|...|26.| |.3.|.7.|...| |...|...|...| |---+---+---| |...|.8.|653| |6.1|...|...| |...|...|..7| |---+---+---| |...|6.4|1..| |.5.|...|.8.| |...|2..|...| *-----------* #83 *-----------* |4..|...|26.| |.3.|.7.|...| |...|...|...| |---+---+---| |...|.8.|.53| |6.1|...|...| |...|...|..7| |---+---+---| |...|6.4|1..| |.5.|...|.8.| |1..|2..|...| *-----------* #155 *-----------* |4..|...|26.| |.3.|.7.|...| |...|...|...| |---+---+---| |...|18.|.53| |6.1|...|...| |...|...|..7| |---+---+---| |...|6.4|1..| |.5.|...|.8.| |...|2..|...| *-----------*

and it turned out all of my BUG-lites were actually the same creature, derived from the same 17 clue puzzle with one reduntant clue added. Only #155 advances to a slightly different stage with basic techniques (subsets and locked candidates), but is then solved with the same BUG-lite. I don't know where Havard found the rest of the 12-cell BUG-lites, but I wouldn't be very suprised if there are others among them that derive from this same puzzle. In fact, it is possible that all of them are the same creature. By moving around the redundant clue I can make 27 different 18 clue puzzles that can be solved with subsets, locked candidates and a 12-cell BUG-lite (+one 17 clue puzzle).

My problem with this is that I cannot really trust this list to give valid results on how common different techniques are. Obviously a 12-cell BUG-lite is not as common as Havards test shows, it just happens that the makers of the list put in several variations of a puzzle that can be solved with one.

tarek wrote:as Ocean mentioned they both originate from the same puzzle (Code...)
but each with a different redudndant candidate. That makes them essentially different.

Essentially different..? Not when you look at the required techniques. A list with 1000+ puzzles, that people use to test how common appearances are of certain patterns, should not allow multiple appearances of puzzles that are as similar as these. Finding the same pattern several times in the same puzzle doesn't make it any more common.

RW

by **ravel** » Wed May 10, 2006 9:59 am

I agree, that this list should not be used for statistics. Puzzles 300 and 307 are totally equivalent and some of the puzzles are non minimal versions of others.
I did not know about twin sudokus in it, as RW mentioned in the beginning, but i noticed for some of them, that the solution paths are nearly the same (the 17 clue list has many of them).

by **ronk** » Wed May 10, 2006 11:55 am

RW wrote:Essentially different..? Not when you look at the required techniques. A list with 1000+ puzzles, that people use to test how common appearances are of certain patterns, should not allow multiple appearances of puzzles that are as similar as these.

I doubt that was dukosu's intent, and some of us just started using it to compare solver results.

As for randomness, I suspect Gordon Royle's compilation of 36628 17-clue puzzles would be more random. Now that's too big a collection for routine testing IMO ... but might be suitable for "final pass" statistics.

Maybe someone could do us all a service and use a "difficulty-rating solver" to select the most difficult 10% ... or even 5% ... from Gordon's list. Of course, reducing the collection size might reduce randomness too.

by **ravel** » Wed May 10, 2006 1:52 pm

ronk wrote:As for randomness, I suspect Gordon Royle's compilation of 36628 17-clue puzzles would be more random. Now that's too big a collection for routine testing IMO ... but might be suitable for "final pass" statistics.

I dont think that this is a good choice either. As i said, there are a lot of twin sudokus in it and i dont think that the techniques needed for 17 clue puzzles are representative for random puzzles (i think tso once mentioned that most are nishio puzzles).
The only good way i see is to make a new collection by calculating millions of pure random puzzles and filter out those which are too easy.

by RW » Wed May 10, 2006 2:01 pm

Just remembered one found by ronk in #226. It seems to arrive at a quite different grid before the reduction, but it's the same BUG-lite. I also found another 12-cell BUG-lite in #75. These two puzzles can be scrambled to:

Code: Select all: #75 *-----------* |4..|...|26.| |.3.|.7.|..1| |...|...|...| |---+---+---| |...|.8.|.53| |6.1|...|...| |...|...|..7| |---+---+---| |...|6.4|1..| |.5.|...|.8.| |...|2..|...| *-----------* #226 *-----------* |4..|5..|26.| |.3.|.7.|...| |...|...|...| |---+---+---| |...|.8.|.53| |6.1|...|...| |...|...|..7| |---+---+---| |...|6.4|1..| |.5.|...|.8.| |...|2..|...| *-----------*

Look familiar?

ravel wrote:I agree, that this list should not be used for statistics.

ronk wrote:I doubt that was dukosu's intent, and some of us just started using it to compare solver results.

I would guess this is what happened too. This was probably just supposed to be a list of hard sudokus for hardcore solvers. If your not a machine that works through all the puzzles, it probably doesn't matter if some puzzles are quite similar, you would not find the similarities. Unfortunately, I am a machine and found them.

Another interesting feature is that the scrambling never swaps numbers 1 and 9. All other numbers have been swapped but numbers 1 and 9 are always the same. This tells me that the scrambling cannot have been done with a totally randomized computer scrambler, seems unlikely that a random machine would leave the same two numbers untouched five times in a row. Could the scrambling have been done manually?

ravel wrote:The only good way i see is to make a new collection by calculating millions of pure random puzzles and filter out those which are too easy.

Sounds like a good idea to me. Just tell your generator to make puzzles, start each one from scratch, forget the ones that are solvable with the techniques you find too easy and write down the rest. Then press the "do it" button and come back after the summer to see what it accomplished. Should give a nice list.

by **Ruud** » Wed May 10, 2006 5:34 pm

Hi RW (and other interested parties),

I have posted a list of 50,000 sudokus on one of my websites. This list consists of sudokus that require at least 2 guesses (tabling steps) by SudoCue, sorted in descending order of overall difficulty.

50,000 is a nice round number to run statistics on. There are no 17's in the list, but they are not mine to publish.

The zip file contains a single text file with 1 sudoku per line. Empty cells are represented by zeroes. The .sdm extension is recognized by SudoCue as a file containing multiple puzzles.

If anyone wishes to improve the list, send the updated version to my e-mail and I will post it as a new version.

Cheers, Ruud.

by **ravel** » Wed May 10, 2006 9:27 pm

Sounds, if we would not have to wait for autumn

Can you explain, how "randomly" the puzzles are and when the program would not guess ?

by **Ruud** » Wed May 10, 2006 10:21 pm

ravel wrote:Can you explain, how "randomly" the puzzles are and when the program would not guess ?

I create lots of sudokus to find fresh daily nightmares. A nightmare is a sudoku that can be solved without guessing, but with a complicated set of solving techniques required. Some of you must have seen them. They are often used to show certain techniques.

The program will not guess when the sudoku can be solved with:
- singles
- linebox reductions
- disjoint subsets
- any single digit technique, including (finned) fish, coloring, nishio.
- remote pairs
- XY-wing
- XYZ-wing
- BUG+1
- UR types 1-4

On average, to find a single nightmare, I must generate 10,000 puzzles. Of these puzzles, 8500 are solved with singles, 500 require 1 or more techniques, 1 is the nightmare and 999 require a tabling step. The puzzles are 100% randomly generated. The only presets are a maximum number of clues (which I set somewhere between 24 and 30) and symmetry. The generator only produces symmetrically minimal puzzles, except for those without symmetry, which are fully minimal.

These puzzles were generated during several runs between december and april. It is possible that a lot of them can be solved with XYZ-wing, APE, ALS, subset counting or Nice Loops. Not all these techniques are implemented, and some were implemented after the database already contained a lot of puzzles.

On a side note: The top1465 is a collection of puzzles put together by Dukuso from various sources (including Gordons minimal 17's) which were the toughest for his backtracking solver. To test the effect of permutations, some puzzles were deliberately scrambled, others were deminimalized in different ways. On the programmers forum, we did our testing mainly on the top95 and later the top880.

These 50,000 puzzles are like the "unsolvables" on MM's site, only in larger quantities. To test the effectiveness of a new solving technique, this could be a very useful list.

cheers, Ruud.

by **keith** » Wed May 10, 2006 11:06 pm

Comment:

Computers only generate "pseudo-random" numbers (at least when I was taught mathematics 30 years ago!).

Question for Ruud:

1. For the Nightmare, do you generate solution grids (that are unique and different) and then find some minimum set of initial clues, or;

2. Do you generate sets of initial clues and then check if they have a valid solution?

If you do the first, it seems to me it would be easy to generate a list of puzzles that do indeed, not have the same (scrambled) solution.

Question for everyone:

According to Wikipedia, there are 5,472,730,538 non-symmetric solution grids. (Which means solutions that cannot be scrambled to be the same.) Isn't this enough to find a set of different puzzles?

I am not sure the test cases need to be random, since published puzzles all have been through some selection or screening process. But, I tend to agree with the feeling that two sets of initial clues for the same solution grid will probably require similar techniques to solve.

Keith

by **Ruud** » Wed May 10, 2006 11:52 pm

keith wrote:1. For the Nightmare, do you generate solution grids (that are unique and different) and then find some minimum set of initial clues, or;
2. Do you generate sets of initial clues and then check if they have a valid solution?

I'm using option 2. Even when 2 puzzles have the same solution (think of the chance of that happening), they would have different clues and solve differently. My database has a unique key on the clues in rows 1, 2 & 3. Each puzzle in the database is created new, and not scrambled from an existing puzzle.

If that is not guarantee enough, ask gsf to convert all 50,000 to the canonical version and check for duplicates. For each duplicate found, you may take a point from my "What am I" score.

Ruud.

by **gsf** » Thu May 11, 2006 1:50 am

Ruud wrote:If that is not guarantee enough, ask gsf to convert all 50,000 to the canonical version and check for duplicates. For each duplicate found, you may take a point from my "What am I" score.

already did that when they were posted
no duplicate (up to isomorphism) solution grids in the 50K
(which implies no duplicate puzzles)

by **frazer** » Thu May 11, 2006 9:11 am

The problem with keith's "question for everyone" is that the number 5472730538 was computed mathematically, not by listing all the possibilities. It would be interesting to have a nice list of all of them, and it may just about be feasible to construct it.

The New Sudoku Players' Forum

Question about top1465

Question about top1465

Re: Question about top1465

My question