High clue tamagotchis

Everything about Sudoku that doesn't fit in one of the other sections

Re: High clue tamagotchis

Postby dobrichev » Tue Mar 04, 2014 2:31 am

Yes, at any stage the puzzle is checked for redundant clues and this takes almost all CPU time.
My general search is based on 2 tools.
minus n < infile > outfile
plus1 < infile > outfile.single.solution 2> outfile.multiple.solutions
I am keeping track of already processed subgrids except for non-minimal puzzles.

The command currently running looks like
nohup /dobrichev/tools/plus1/plus1 > >(comm -13 ../37 - > 37.new) 2> >(comm -13 <(unzip -p ../37.doneup.2.zip) - | comm -13 <(unzip -p ../37.doneup.1.zip) - | zip -q > 37.doneup.3.zip) < <(unzip -p 36.doneup.3.zip) &

../37 is a file with all known unique minimal 37-clue puzzles
37.new will store the newly found unique minimal 37-clue puzzles
../37.doneup.1.zip, ../37.doneup.2.zip are files with multiple-solution minimal 37-clue puzzles that have been processed on previous passes
37.doneup.3.zip will store the multiple-solution minimal 37-clue puzzles obtained on this pass
36.doneup.3.zip is the seed - the multiple-solution minimal 36-clue puzzles obtained from 35s on the previous run.

Beside that, I am doing {-5,+7} on the newly found 39s. The tool takes the puzzles as the only input and produces a mixture of minimal unique puzzles of different sizes that are involved later in the {-n} process. The final steps at +5 and +6 are executed in seconds. Some minimal multiple solution 40s appeared and none of the 41s.

I am scanning all valid minimal puzzles for minimal twins.
I also scanned valid puzzles of size 37+ for twins with possible redundant clues, then minimized them and involved those with 36+ givens in the {-n} process. I am not sure whether this added a value.

Storing the processed non-unique minimal puzzles and removing them from duplicate processing in the next passes reduces the count by 2% to 20% at each stage, less for the lower stages. Filtering out the processed 35s and 36s takes hours and probably is inefficient. I am trying to do this I/O intensive process in parallel with the CPU intensive {-5, +7} process.

The program code is extracted from gridchecker and is free.
2016 Supporter
Posts: 1369
Joined: 24 May 2010

40 !!!

Postby dobrichev » Tue Mar 04, 2014 5:56 am

This is the first discovered 40-given minimal unique Sudoku puzzle
Code: Select all
2016 Supporter
Posts: 1369
Joined: 24 May 2010

Re: 40 !!!

Postby blue » Tue Mar 04, 2014 7:17 am

Fantastic :!:
Congratulations !
Posts: 599
Joined: 11 March 2013

Re: 40 !!!

Postby Lars Petter Endresen » Tue Mar 04, 2014 8:04 am

Wow! Fantastic news Mladen! Congratulations!
Lars Petter Endresen
Posts: 7
Joined: 03 June 2007
Location: NORWAY

Re: High clue tamagotchis

Postby Havard » Tue Mar 04, 2014 9:21 am

Really cool! Congratulations! Now for a 41... :)
Posts: 377
Joined: 25 December 2005

Re: 40

Postby coloin » Tue Mar 04, 2014 7:48 pm

Absolutely unbelievable.
Your puzzle indeed has 40 clues and is minimal.
Well done.
Posts: 1692
Joined: 05 May 2005

Re: High clue tamagotchis

Postby dobrichev » Tue Mar 04, 2014 8:07 pm

Thank you all! This makes me happy.

I ran {-6,+7} on the first 40-given. The result is
Code: Select all
      2 40
      1 39
     24 38
    592 37
   2471 36
   1025 35

where the secondary 40 is at {-1,+1} from the seed, and the only 39 is at {-4,+3} from both 40s.
From the same pass 16 new 39s appeared so far.
After these 40s somehow I lost the focus :?
2016 Supporter
Posts: 1369
Joined: 24 May 2010

Re: High clue tamagotchis

Postby eleven » Tue Mar 04, 2014 8:22 pm

What a surprise !!

Great day.

Congratulations !!
Posts: 1709
Joined: 10 February 2008


Postby coloin » Tue Mar 04, 2014 8:58 pm

Yes - you can rest in peace now !

In another universe when 9*9 sudoku is reinvented - they won't find a 40 !

Posts: 1692
Joined: 05 May 2005

40-clue minimal puzzle

Postby Serg » Wed Mar 05, 2014 9:50 am

Hi, Mladen!
Outstanding discovery! Congratulations!
In what way did you find 40-clue minimal valid puzzle?

2018 Supporter
Posts: 570
Joined: 01 June 2010
Location: Russia

Re: High clue tamagotchis

Postby dobrichev » Thu Mar 06, 2014 6:53 am

Thank you.

@eleven: No, I have no answer how many 40s there are. :D You can move your graphics slightly at right.
@Serg: Methods are explained in the recent posts here. In general, a classical {-m, +n} and checking for twins is done. All puzzles are truncated to multiple-solution 35s and are checked up until there are still no redundant clues. The source code is attached.
@coloin: This is the best compliment I ever heard in my life :lol: Thanks. But ... now what?

Havard Jun 09, 2007 wrote:It seems everyone is bending old "rules" nowadays!:) Now find me a 38!:)

ravel Jul 17, 2007 wrote:Hey, another look and here it is:...

Havard Jul 18, 2007 wrote:... But first lets find a 39!:)

Havard Aug 27, 2007 wrote:from the SudokuArchitect team: ...

dobrichev Mar 04, 2014 wrote:This is the first discovered 40-given minimal unique Sudoku puzzle ...

Havard Mar 04, 2014 wrote:Really cool! Congratulations! Now for a 41... :)

??? ??? wrote:Here is the 41-clue ...

This is the secondary 40-clue minimal puzzle, a close neighbour of the first one.
Code: Select all
..........12.34567.345.6182..1.582.6..86....1.2...785...37.5.2..8..6.7..2.7.83615   40

These are the 18 39s born in the last pass. The total is 563.
Code: Select all

A mirror of these puzzles is kept in https://sites.google.com/site/dobrichev/
The workhorse in finding the first 40-clue puzzle
(23.46 KiB) Downloaded 106 times
2016 Supporter
Posts: 1369
Joined: 24 May 2010


Postby dobrichev » Sat Mar 15, 2014 5:59 am

38 new 39s from 3 passes, total 601 known 39s
Code: Select all

The 36s have been excluded from the search. So far 37+ by itself generate increasing number of new 37+ puzzles at each subsequent pass.
2016 Supporter
Posts: 1369
Joined: 24 May 2010

Re: High clue tamagotchis

Postby tarek » Sun Mar 16, 2014 1:18 am

Just noticed this. Great news. Well done!!!
User avatar
Posts: 2644
Joined: 05 January 2006

Re: High clue tamagotchis

Postby dobrichev » Sat Apr 05, 2014 8:43 pm

Thank you, tarek.
2016 Supporter
Posts: 1369
Joined: 24 May 2010

Re: High clue tamagotchis

Postby dobrichev » Sat Apr 05, 2014 9:30 pm

Comparison of Intel and GNU compilers and incomplete scalability analysis

CPUs: 2 x Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz (8 cores per CPU + hyperthreading, 20MB cache)

Intel compiler
version: icc version 14.0.0 (gcc version 4.4.7 compatibility)
command line switches: -openmp -ipo

GNU compiler
version: gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC)
command line switches: -fopenmp -O3 -march=native

Tested tool
The tool does the following
- Loads a list of puzzles from a file (serial code)
- Converts them to canonical form and removes duplicates (serial code)
- Removes N givens from each puzzle in all possible ways, converts the resulting sub-puzzles to canonical form and removes duplicates (serial code)
- M times adds a clue in all possible ways to all puzzles, checks for minimality, converts resulting puzzles to canonical form, checks for unique solution, and removes duplicates (OMP parallel code)
- Stores the puzzles to a file (serial code)
It accepts N and M as parameters, loads from stdin, and stores to stdout.

The test
The code is compiled with Intel and GNU compilers into separate binaries.
The number of parallel threads is controlled by environment variable OMP_NUM_THREADS.
The timings of each run are measured once by "time" command, and once by clock() function.
All runs are done with the same parameters N=4, M=5, and on the same dataset consisting of the two known 40-given minimal sudoku puzzles.
Runs with 1,2,4,8,16,32,64 and 33 parallel threads are done for both binaries.
All runs are executed from a shell script with 5 second pause between consecutive runs.
The runs with up to 16 threads are expected to be linearly scalable since the hardware has 16 physical cores.
The runs with 32 threads are expected to give ~20% better results than those with 16 since the hyperthreading share the ALU.
The runs with 33 and 64 threads are expected to run a little slower or at the same rate as 32 threads since there are no IO operations and the hardware resources are limited to 32 threads.
The full test was performed 3 times. The timings are similar, but instead of averages, the results from a single test (the first one) are analysed. About 10 partial test have been executed in the 8-64 parallel threads area.
During the tests no other CPU consuming processes have been run.

Raw results are in the table, the rows for real, user and sys times reported by OS time utility, and the clock() rows reported by the binary.
The overhead rows are calculated on the basis that the serial code takes 5 seconds and there are sufficient HW resources for each parallel thread.
(The serial code timings have not been explicitly measured but is estimated to be between 1.9 and 6.5 seconds.)
The real time Intel/GNU ration is measured by dividing the time consumed by Intel-compiled binary to the time consumed by GNU-compiled binary.

The values in the picture in textual form
Hidden Text: Show
Code: Select all
OMP_NUM_THREADS             1           2           4          8      16(physical)  32(logical)      64          33

Intel          real    15m15.308s   7m40.323s   3m57.776s    2m6.339s   1m17.919s    0m53.717s    2m22.400s   1m40.729s
               user    15m14.504s  15m15.551s  15m37.508s  16m19.855s  19m43.421s   25m35.112s   32m20.851s   30m4.700s
                sys      0m0.037s    0m0.056s    0m0.031s    0m0.067s    0m0.074s     0m0.308s   41m14.614s  16m34.348s
         clock(), s           915         916         938         980        1183         1535         4415        2799
           overhead                        0%          2%          7%         28%          71%     

GNU            real    17m57.704s   8m59.814s   4m38.910s   2m27.984s   1m31.976s     1m8.790s     1m3.142s    1m5.969s
               user    17m56.760s  17m53.548s  18m19.413s  19m8.075s   23m14.564s   27m43.283s   30m59.307s   29m8.941s
                sys      0m0.038s    0m0.034s    0m0.023s    0m0.035s    0m0.073s     0m0.118s     0m0.304s    0m0.145s
         clock(), s          1077        1074        1099        1148        1395         1663         1860        1749
           overhead                        0%          2%          7%         30%          90%     

real time Intel/GNU           85%         85%         85%         85%         85%          78%         226%        153%

scalability_table.PNG (14.8 KiB) Viewed 622 times

scalability_pic.gif (3.93 KiB) Viewed 622 times

In general Intel compiler shows stable 15% faster code.
This partially could be due to the fact that years ago heavy work have been done in the optimization of the internal loops in the solver and then Intel compiler has been used.
The only part in code where explicit difference exist (via a macro) is the unrolling of one 9-iterations loop, where it has been extensively tested that Intel compiler performs better with unrolling but GNU compiler performs better without unrolling.

At the edge of the threads (32), Intel compiler performs 22% faster, but both binaries don't work well.

The overhead is unexpectedly high, at lest for 16 threads and above.
Later attempts in minimization of the locked time have been done. They resulted in few percents better performance even for single thread, but virtually unchanged scalability.
The tested code flushes at once the puzzles suitable for output or next iteration obtained by adding a clue to a single puzzle. An experiment to flush results from 100 source puzzles at once gave worse result. Flushing any new puzzle immediately gave better result but no measurable change in the scalability.

Intel's binary performs very bad when overloaded. Even enforcing the thread count from 32 to 33 leads to disaster degrading performance almost twice. The user time increases but the boost comes from system time.
GNU performs better with more threads. User and system times slightly increase, but the real time decreases. This should be covered by the reduced idle time.
One explanation of this paradox is that compilers use different OMP libraries or at least different default values for the policies used by the library.
In only one test (out of more than 10), for 33 threads, Intel's binary performed just like the GNU one, slightly increasing user/sys but decreasing the real time.

The modern OMP runtime can monitor the CPU load and automatically reduce the load by its threads. This behavior was not in the scope of these tests.

Intel compiler performs 15% better on single thread and 22% better on the default number of threads = number of logical CPUs.
For the binary compiled with Intel's compiler, if the number of threads is enforced to the number of logical CPUs, any other CPU-consuming process could drastically degrade the performance.
Leaving the runtime to determine the number of parallel threads could save the process from the observed overloading disaster.
GNU compiler demonstrated better behavior in the cases where it has been simulated that the user doesn't know what he is actually doing.
2016 Supporter
Posts: 1369
Joined: 24 May 2010


Return to General