The New Sudoku Players' Forum

Website · by **denis_berthier** » Tue Nov 30, 2010 3:45 pm

This thread, which has been lost in totality due to the May 2009 crash of the original Sudoku Player's Forum, is available on my website: http://www.carva.org/denis.berthier/HLS/SPF/RDMP/index.html

by **ttt** » Tue Nov 30, 2010 4:43 pm

Hi denis,
Welcome back...!
I hope that there are something news for solving sudoku puzzles from you...

ttt

by **eleven** » Tue Nov 30, 2010 4:56 pm

Hi Denis,

very nice to have a copy of this thread.

But i had to manually delete the binary part in the files (down to <!DOCTYPE HTML ..) to be able to read it. Is there a better way ?

Website · by **denis_berthier** » Tue Nov 30, 2010 5:40 pm

Hi ttt and eleven,

Nice to see you here.
I'm not really back on this forum. For more than a year, I haven't had time for Sudoku (and I won't have more in the near future). I feel I have completed my approach with my last results about zt-braids and unbiased classifications. The fact is, I have no new idea.

I had sent all the files for all my threads to Jason long ago, but it seems it hasn't been possible to restore them into the new forum. So, I decided to put on my website all that I had.
Eleven, I don't know how to answer you: I have no problem to open the webarchive format on my Mac, using the Safari browser. If anyone knows of a way of making all this easier to read, please let me know.

Notice that I put these Forum files online, as a historical reference, but a synthesis of all this has been available for a long time on my usual web pages for those who are not interested in details: http://www.carva.org/denis.berthier/HLS/index.html

by **eleven** » Tue Nov 30, 2010 10:31 pm

denis_berthier wrote:I had sent all the files for all my threads to Jason long ago, but it seems it hasn't been possible to restore them into the new forum. So, I decided to put on my website all that I had.
Eleven, I don't know how to answer you: I have no problem to open the webarchive format on my Mac, using the Safari browser. If anyone knows of a way of making all this easier to read, please let me know.

So i hope, that Jason & gsf will take another effort to restore the thread.
I now tried the pages with the windows browser, but it has the same problem, maybe the webarchive format is something Mac specific.

by **JasonLion** » Wed Dec 01, 2010 1:31 am

We never found a way to restore those threads into the forum without a substantial amount of manual work. I've tinkered with it off and on for sometime, even restored a page or two completely manually. Unfortunately, each way I try to do it always ends up being too much work to be practical.

It is possible to extract the .webarchive files into normal web pages using a utility like WebArchive Folderizer, at least if you have a Mac. From there, they could be posted as normal web pages (without the forum software behind them) fairly easily.

Website · by **denis_berthier** » Wed Dec 01, 2010 4:38 am

JasonLion wrote:It is possible to extract the .webarchive files into normal web pages using a utility like WebArchive Folderizer, at least if you have a Mac. From there, they could be posted as normal web pages (without the forum software behind them) fairly easily.

Thanks for the suggestion. I gave it a quick try. It works well for one webarchive, but it generates lots of files and subdirectories for each webarchive. I need more time to check if I can do it for my whole set of webarchive files.

Eleven, you said you deleted manually the binary parts of the webarchive files. I also tried to do this (it seems simpler than using WebArchive Folderizer). I used vi and I also deleted the final part after </body>. But something gets lost in the process. It seems the deleted binary parts kept some information specific to Safari. This may be related to Jason's difficulties in restoring the files into the forum. Could anyone try to open the webarchive files in Safari on a PC? (You can download it here: http://www.apple.com/safari/download/).

by **JasonLion** » Sat Mar 26, 2011 12:57 pm

Thanks to denis_berthier we now have PDF files of posts made to this topic between July 2009 and January 2010.

Page 1
Page 2
Page 3
Page 4
Page 5
Page 6
Page 7
Page 8
Page 9
Page 10
Page 11
Page 12
Page 13
Page 14
Page 15
Page 16
Page 17
Page 18
Page 19
Page 20
Page 21
Page 22
Page 23
Page 24
Page 25
Page 26
Page 27
Page 28
Page 29
Page 30
Page 31
Page 32
Page 33
Page 34
Page 35
Page 36
Page 37
Page 38
Page 39
Page 40
Page 41
Page 42
Page 43

Website · by **denis_berthier** » Mon Jun 03, 2013 5:37 am

How to compute unbiased statistics

Considering some recent discussions about unbiased statistics in the "grey zone" and the "JExocet" threads, it seems that some of the results reached in this old lost thread and some of their obvious consequences have been forgotten.
For a summary of the essential results of this thread, see the references below.

Note: the following applies to the whole collection of minimal puzzles. It can obviously not be extended to a restricted collection defined by particular properties, such as having a rating larger than some predefined value (wrt to some predefined rating) or having some predefined pattern.

Main result

The main result of this thread was, we now have a close estimate for the real distribution of minimal puzzles wrt to their number of clues (as a result, we also have a close estimate for the total number of minimal puzzles). Let me recall the distribution here for ease of use.

Code: Select all: #clues %puzzles 20 1.32E-7 21 0.000034 22 0.00348 23 0.148 24 2.28 25 13.42 26 31.91 27 32.71 28 15.48 29 3.598 30 0.41 31 0.0241 32 0.00102 these results are valid with overall relative precision 0.065% see page 43 of the pdf copies of the old forum for more detailed results

Puzzles outside the [20, 31] range can be neglected in any statistical study bearing on the whole collection of minimal puzzles (e.g. the frequency of puzzles having some pattern).

Let pk be the proportion of puzzles with k clues, according to this estimate.

Practical applications

As of today, we are still unable to generate unbiased uncorrelated collections of puzzles.
The best we can generate remains uncorrelated collections with controlled-bias (this is how the above results were obtained), and such generation remains very time consuming (about 257,000 times longer than top-down generation).

However, do we need a controlled-bias collection to produce unbiased statistics? NO. The above results are valid once and for all and they can be used as such.
Of course, the closer to unbiased the better and controlled-bias is the closest we currently have.
But suppose we have another uncorrelated collection, generated e.g. by a top-down generator (whose bias is much larger than the controlled-bias) or a bottom-up one (whose bias is still much larger).

Suppose we are interested in some random variable X, e.g.:
- X = 1 if the puzzle has an sk-loop, 0 otherwise
- X = 1 if the puzzle has a J-Exocet, 0 otherwise
- X = number of J-Exocets in the puzzle
- X = number of eliminations by J-Exocets in the puzzle after some set of other rules (e.g. SSTS) has been applied
and suppose we want to estimate the real mean of X.

If, instead of simply taking the mean for all the puzzles in the collection, we use a weighted mean (using the pk as weights), we get an estimate of the unbiased mean value of X for the minimal puzzles. More precisely, let E(Xk) be the mean value of X for the puzzles with k clues in the collection. Then the unbiased estimate of the mean E(X) of X is merely:

E(X) = E(X1)*p1 + E(X2)*p2 + ...

Another useful quantity is the standard deviation sd(X) of X (it can be used as an informal measure of the precision of the estimate of E(X)). It is given by

sd(X)^2 = sd(X1)^2*p1 + sd(X2)^2*p2 + ...

This is not a panacea, especially if the largest values of X occur with puzzles having a small or a large number of clues.
These formulae also show that, when studying X, it is generally useless to analyze the full collection at disposal. Analysing millions of puzzles with low probability numbers of clues will not improve the results. It is more efficient to randomly extract a sub-collection.

Concerning JExocets, for the puzzles in the "potential hardest" collection, almost all of them are found for puzzles with k = 22, 23, 24 clues.
If this remained true for a top-down generated collection, this would be enough to entail that JExocets are present in at most 2% of the minimal puzzles.
(Indeed, my own estimates, based on other calculations, put it very close to 0%).

References:

- "Unbiased Statistics of a CSP - A Controlled-Bias Generator", International Joint Conferences on Computer, Information, Systems Sciences and Engineering (CISSE 09), December 4-12, 2009, Springer. pdf preprint. Published as a chapter of the book Innovations in Computing Sciences and Software Engineering, Khaled Elleithy Editor, pp. 11-17, Springer, 2010, ISBN 97890481911133.
- "Constraint Resolution Theories", Lulu.com, Oct. 2011, ISBN : 978-1-4478-6888-0
- "Pattern-Based Constraint Satisfaction", chapter 6, Lulu.com, Nov. 2012, ISBN 978-1-291-20339-4

[Edit 2013/06/20: added precision of the distribution and reference to p. 43]

by **champagne** » Mon Jun 03, 2013 7:04 am

denis_berthier wrote: Main result

The main result of this thread was, we now have a close estimate for the real distribution of minimal puzzles wrt to their number of clues (as a result, we also have a close estimate for the total number of minimal puzzles). Let me recall the distribution here for ease of use.

Code: Select all
#clues %puzzles 20 0.0 21 0.000034 22 0.0034 23 0.149 24 2.28 25 13.42 26 31.94 27 32.74 28 15.48 29 3.56 30 0.41 31 0.022

if the last eleven's generation is representative, the distribution in the grey area is completely different
eleven found 93% of the puzzles in the range 23_26 clues for a rating 8.6 and more

Website · by **denis_berthier** » Mon Jun 03, 2013 7:36 am

champagne wrote:if the last eleven's generation is representative, the distribution in the grey area is completely different
eleven found 93% of the puzzles in the range 23_26 clues for a rating 8.6 and more

As I wrote at the start of my post, these stats can't be applied to any sub-collection, such as a "grey area" based on SER values.
Moreover, as I also wrote, collections generated by a top-down generator are known to be strongly biased.

by **champagne** » Mon Jun 03, 2013 8:11 am

denis_berthier wrote:
champagne wrote:if the last eleven's generation is representative, the distribution in the grey area is completely different
eleven found 93% of the puzzles in the range 23_26 clues for a rating 8.6 and more

As I wrote at the start of my post, these stats can't be applied to any sub-collection, such as a "grey area" based on SER values.
Moreover, as I also wrote, collections generated by a top-down generator are known to be strongly biased.

right as a general frequency, but the frequency in the grey area has more to do with the collection of puzzles eligible to the grey area

Website · by **denis_berthier** » Mon Jun 03, 2013 8:16 am

champagne wrote:right as a general frequency, but the frequency in the grey area has more to do with the collection of puzzles eligible to the grey area

Could you stop polluting every possible thread with your compulsive reactions, that are both out of topic and devoid of any content?

by **champagne** » Mon Jun 03, 2013 8:18 am

denis_berthier wrote:
champagne wrote:right as a general frequency, but the frequency in the grey area has more to do with the collection of puzzles eligible to the grey area

Could you stop polluting every possible thread with your compulsive reactions devoid of any content?

could you pleas tell what is wrong in my sentence.

As far as I know, the frequency in the grey area is the ratio

puzzles of the grey collection having the property "x" / total number of puzzles in that collection

Website · by **denis_berthier** » Mon Jun 03, 2013 8:31 am

champagne wrote:
denis_berthier wrote:
champagne wrote:the frequency in the grey area has more to do with the collection of puzzles eligible to the grey area

Could you stop polluting every possible thread with your compulsive reactions devoid of any content?

could you pleas tell what is wrong in my sentence.

If you need explanations, here they are:

1) In standard English (but this can be translated into any natural language):
the sentence "the frequency in the grey area has more to do with the collection of puzzles eligible to the grey area"
is equivalent to
"the frequency in the grey area is more about puzzles in the grey area"
which would be a tautology without the word "more".

2) I clearly declared the grey area was OOT. That's the problem in general with compulsive posters like you: answering before they have understood what's being discussed. As a result, every thread becomes polluted with irrelevant posts.

To be clear: I'm not expecting any answer !!!!!
And I would be the happier if Jason deleted all the parasitic posts after this http://forum.enjoysudoku.com/post227502.html#p227502

(I mean both champagne and mine, including the present one)

The New Sudoku Players' Forum

The real distribution of minimal puzzles

The real distribution of minimal puzzles

Re: The real distribution of minimal puzzles

Re: The real distribution of minimal puzzles

Re: The real distribution of minimal puzzles

Re: The real distribution of minimal puzzles

Re: The real distribution of minimal puzzles

Re: The real distribution of minimal puzzles

Re: The real distribution of minimal puzzles

Re: The real distribution of minimal puzzles

Re: The real distribution of minimal puzzles

Re: The real distribution of minimal puzzles

Re: The real distribution of minimal puzzles

Re: The real distribution of minimal puzzles

Re: The real distribution of minimal puzzles

Re: The real distribution of minimal puzzles