The real distribution of minimal puzzles

Everything about Sudoku that doesn't fit in one of the other sections

The real distribution of minimal puzzles

Postby denis_berthier » Tue Nov 30, 2010 3:45 pm

This thread, which has been lost in totality due to the May 2009 crash of the original Sudoku Player's Forum, is available on my website: http://www.carva.org/denis.berthier/HLS/SPF/RDMP/index.html
denis_berthier
2010 Supporter
 
Posts: 1253
Joined: 19 June 2007
Location: Paris

Re: The real distribution of minimal puzzles

Postby ttt » Tue Nov 30, 2010 4:43 pm

Hi denis,
Welcome back...!
I hope that there are something news for solving sudoku puzzles from you... :D

ttt
ttt
 
Posts: 185
Joined: 20 October 2006
Location: vietnam

Re: The real distribution of minimal puzzles

Postby eleven » Tue Nov 30, 2010 4:56 pm

Hi Denis,

very nice to have a copy of this thread.

But i had to manually delete the binary part in the files (down to <!DOCTYPE HTML ..) to be able to read it. Is there a better way ?
eleven
 
Posts: 1578
Joined: 10 February 2008

Re: The real distribution of minimal puzzles

Postby denis_berthier » Tue Nov 30, 2010 5:40 pm

Hi ttt and eleven,

Nice to see you here.
I'm not really back on this forum. For more than a year, I haven't had time for Sudoku (and I won't have more in the near future). I feel I have completed my approach with my last results about zt-braids and unbiased classifications. The fact is, I have no new idea.

I had sent all the files for all my threads to Jason long ago, but it seems it hasn't been possible to restore them into the new forum. So, I decided to put on my website all that I had.
Eleven, I don't know how to answer you: I have no problem to open the webarchive format on my Mac, using the Safari browser. If anyone knows of a way of making all this easier to read, please let me know.

Notice that I put these Forum files online, as a historical reference, but a synthesis of all this has been available for a long time on my usual web pages for those who are not interested in details: http://www.carva.org/denis.berthier/HLS/index.html
denis_berthier
2010 Supporter
 
Posts: 1253
Joined: 19 June 2007
Location: Paris

Re: The real distribution of minimal puzzles

Postby eleven » Tue Nov 30, 2010 10:31 pm

denis_berthier wrote:I had sent all the files for all my threads to Jason long ago, but it seems it hasn't been possible to restore them into the new forum. So, I decided to put on my website all that I had.
Eleven, I don't know how to answer you: I have no problem to open the webarchive format on my Mac, using the Safari browser. If anyone knows of a way of making all this easier to read, please let me know.

So i hope, that Jason & gsf will take another effort to restore the thread.
I now tried the pages with the windows browser, but it has the same problem, maybe the webarchive format is something Mac specific.
eleven
 
Posts: 1578
Joined: 10 February 2008

Re: The real distribution of minimal puzzles

Postby JasonLion » Wed Dec 01, 2010 1:31 am

We never found a way to restore those threads into the forum without a substantial amount of manual work. I've tinkered with it off and on for sometime, even restored a page or two completely manually. Unfortunately, each way I try to do it always ends up being too much work to be practical.

It is possible to extract the .webarchive files into normal web pages using a utility like WebArchive Folderizer, at least if you have a Mac. From there, they could be posted as normal web pages (without the forum software behind them) fairly easily.
User avatar
JasonLion
2017 Supporter
 
Posts: 624
Joined: 25 October 2007
Location: Silver Spring, MD, USA

Re: The real distribution of minimal puzzles

Postby denis_berthier » Wed Dec 01, 2010 4:38 am

JasonLion wrote:It is possible to extract the .webarchive files into normal web pages using a utility like WebArchive Folderizer, at least if you have a Mac. From there, they could be posted as normal web pages (without the forum software behind them) fairly easily.

Thanks for the suggestion. I gave it a quick try. It works well for one webarchive, but it generates lots of files and subdirectories for each webarchive. I need more time to check if I can do it for my whole set of webarchive files.

Eleven, you said you deleted manually the binary parts of the webarchive files. I also tried to do this (it seems simpler than using WebArchive Folderizer). I used vi and I also deleted the final part after </body>. But something gets lost in the process. It seems the deleted binary parts kept some information specific to Safari. This may be related to Jason's difficulties in restoring the files into the forum. Could anyone try to open the webarchive files in Safari on a PC? (You can download it here: http://www.apple.com/safari/download/).
denis_berthier
2010 Supporter
 
Posts: 1253
Joined: 19 June 2007
Location: Paris


Re: The real distribution of minimal puzzles

Postby denis_berthier » Mon Jun 03, 2013 5:37 am




How to compute unbiased statistics


Considering some recent discussions about unbiased statistics in the "grey zone" and the "JExocet" threads, it seems that some of the results reached in this old lost thread and some of their obvious consequences have been forgotten.
For a summary of the essential results of this thread, see the references below.

Note: the following applies to the whole collection of minimal puzzles. It can obviously not be extended to a restricted collection defined by particular properties, such as having a rating larger than some predefined value (wrt to some predefined rating) or having some predefined pattern.


Main result

The main result of this thread was, we now have a close estimate for the real distribution of minimal puzzles wrt to their number of clues (as a result, we also have a close estimate for the total number of minimal puzzles). Let me recall the distribution here for ease of use.


Code: Select all
#clues      %puzzles 
20          1.32E-7       
21          0.000034
22          0.00348
23          0.148
24          2.28
25          13.42
26          31.91
27          32.71
28          15.48
29          3.598
30          0.41
31          0.0241
32         0.00102

these results are valid with overall relative precision  0.065%

see page 43 of the pdf copies of the old forum for more detailed results


Puzzles outside the [20, 31] range can be neglected in any statistical study bearing on the whole collection of minimal puzzles (e.g. the frequency of puzzles having some pattern).

Let pk be the proportion of puzzles with k clues, according to this estimate.


Practical applications

As of today, we are still unable to generate unbiased uncorrelated collections of puzzles.
The best we can generate remains uncorrelated collections with controlled-bias (this is how the above results were obtained), and such generation remains very time consuming (about 257,000 times longer than top-down generation).

However, do we need a controlled-bias collection to produce unbiased statistics? NO. The above results are valid once and for all and they can be used as such.
Of course, the closer to unbiased the better and controlled-bias is the closest we currently have.
But suppose we have another uncorrelated collection, generated e.g. by a top-down generator (whose bias is much larger than the controlled-bias) or a bottom-up one (whose bias is still much larger).

Suppose we are interested in some random variable X, e.g.:
- X = 1 if the puzzle has an sk-loop, 0 otherwise
- X = 1 if the puzzle has a J-Exocet, 0 otherwise
- X = number of J-Exocets in the puzzle
- X = number of eliminations by J-Exocets in the puzzle after some set of other rules (e.g. SSTS) has been applied
and suppose we want to estimate the real mean of X.

If, instead of simply taking the mean for all the puzzles in the collection, we use a weighted mean (using the pk as weights), we get an estimate of the unbiased mean value of X for the minimal puzzles. More precisely, let E(Xk) be the mean value of X for the puzzles with k clues in the collection. Then the unbiased estimate of the mean E(X) of X is merely:

E(X) = E(X1)*p1 + E(X2)*p2 + ...

Another useful quantity is the standard deviation sd(X) of X (it can be used as an informal measure of the precision of the estimate of E(X)). It is given by

sd(X)^2 = sd(X1)^2*p1 + sd(X2)^2*p2 + ...


This is not a panacea, especially if the largest values of X occur with puzzles having a small or a large number of clues.
These formulae also show that, when studying X, it is generally useless to analyze the full collection at disposal. Analysing millions of puzzles with low probability numbers of clues will not improve the results. It is more efficient to randomly extract a sub-collection.


Concerning JExocets, for the puzzles in the "potential hardest" collection, almost all of them are found for puzzles with k = 22, 23, 24 clues.
If this remained true for a top-down generated collection, this would be enough to entail that JExocets are present in at most 2% of the minimal puzzles.
(Indeed, my own estimates, based on other calculations, put it very close to 0%).


References:

- "Unbiased Statistics of a CSP - A Controlled-Bias Generator", International Joint Conferences on Computer, Information, Systems Sciences and Engineering (CISSE 09), December 4-12, 2009, Springer. pdf preprint. Published as a chapter of the book Innovations in Computing Sciences and Software Engineering, Khaled Elleithy Editor, pp. 11-17, Springer, 2010, ISBN 97890481911133.
- "Constraint Resolution Theories", Lulu.com, Oct. 2011, ISBN : 978-1-4478-6888-0
- "Pattern-Based Constraint Satisfaction", chapter 6, Lulu.com, Nov. 2012, ISBN 978-1-291-20339-4


[Edit 2013/06/20: added precision of the distribution and reference to p. 43]
Last edited by denis_berthier on Thu Jun 20, 2013 4:13 am, edited 2 times in total.
denis_berthier
2010 Supporter
 
Posts: 1253
Joined: 19 June 2007
Location: Paris

Re: The real distribution of minimal puzzles

Postby champagne » Mon Jun 03, 2013 7:04 am

denis_berthier wrote: Main result

The main result of this thread was, we now have a close estimate for the real distribution of minimal puzzles wrt to their number of clues (as a result, we also have a close estimate for the total number of minimal puzzles). Let me recall the distribution here for ease of use.


Code: Select all
#clues      %puzzles
20          0.0
21          0.000034
22          0.0034
23          0.149
24          2.28
25          13.42
26          31.94
27          32.74
28          15.48
29          3.56
30          0.41
31          0.022



if the last eleven's generation is representative, the distribution in the grey area is completely different
eleven found 93% of the puzzles in the range 23_26 clues for a rating 8.6 and more
champagne
2017 Supporter
 
Posts: 5725
Joined: 02 August 2007
Location: France Brittany

Re: The real distribution of minimal puzzles

Postby denis_berthier » Mon Jun 03, 2013 7:36 am

champagne wrote:if the last eleven's generation is representative, the distribution in the grey area is completely different
eleven found 93% of the puzzles in the range 23_26 clues for a rating 8.6 and more


As I wrote at the start of my post, these stats can't be applied to any sub-collection, such as a "grey area" based on SER values.
Moreover, as I also wrote, collections generated by a top-down generator are known to be strongly biased.
denis_berthier
2010 Supporter
 
Posts: 1253
Joined: 19 June 2007
Location: Paris

Re: The real distribution of minimal puzzles

Postby champagne » Mon Jun 03, 2013 8:11 am

denis_berthier wrote:
champagne wrote:if the last eleven's generation is representative, the distribution in the grey area is completely different
eleven found 93% of the puzzles in the range 23_26 clues for a rating 8.6 and more


As I wrote at the start of my post, these stats can't be applied to any sub-collection, such as a "grey area" based on SER values.
Moreover, as I also wrote, collections generated by a top-down generator are known to be strongly biased.


right as a general frequency, but the frequency in the grey area has more to do with the collection of puzzles eligible to the grey area
champagne
2017 Supporter
 
Posts: 5725
Joined: 02 August 2007
Location: France Brittany

Re: The real distribution of minimal puzzles

Postby denis_berthier » Mon Jun 03, 2013 8:16 am

champagne wrote:right as a general frequency, but the frequency in the grey area has more to do with the collection of puzzles eligible to the grey area

Could you stop polluting every possible thread with your compulsive reactions, that are both out of topic and devoid of any content?
denis_berthier
2010 Supporter
 
Posts: 1253
Joined: 19 June 2007
Location: Paris

Re: The real distribution of minimal puzzles

Postby champagne » Mon Jun 03, 2013 8:18 am

denis_berthier wrote:
champagne wrote:right as a general frequency, but the frequency in the grey area has more to do with the collection of puzzles eligible to the grey area

Could you stop polluting every possible thread with your compulsive reactions devoid of any content?



could you pleas tell what is wrong in my sentence.

As far as I know, the frequency in the grey area is the ratio

puzzles of the grey collection having the property "x" / total number of puzzles in that collection
champagne
2017 Supporter
 
Posts: 5725
Joined: 02 August 2007
Location: France Brittany

Re: The real distribution of minimal puzzles

Postby denis_berthier » Mon Jun 03, 2013 8:31 am

champagne wrote:
denis_berthier wrote:
champagne wrote:the frequency in the grey area has more to do with the collection of puzzles eligible to the grey area

Could you stop polluting every possible thread with your compulsive reactions devoid of any content?

could you pleas tell what is wrong in my sentence.


If you need explanations, here they are:

1) In standard English (but this can be translated into any natural language):
the sentence "the frequency in the grey area has more to do with the collection of puzzles eligible to the grey area"
is equivalent to
"the frequency in the grey area is more about puzzles in the grey area"
which would be a tautology without the word "more".

2) I clearly declared the grey area was OOT. That's the problem in general with compulsive posters like you: answering before they have understood what's being discussed. As a result, every thread becomes polluted with irrelevant posts.

To be clear: I'm not expecting any answer !!!!!
And I would be the happier if Jason deleted all the parasitic posts after this http://forum.enjoysudoku.com/post227502.html#p227502

(I mean both champagne and mine, including the present one)
denis_berthier
2010 Supporter
 
Posts: 1253
Joined: 19 June 2007
Location: Paris

Next

Return to General