The New Sudoku Players' Forum

Website · by **denis_berthier** » Wed Jun 12, 2013 8:00 am

Red Ed wrote:our aims do differ.
I have only ever cared about the pure number-of-minimals question,
[..]
It is true that the more expensive a statistic you wish to compute on those subtree-generated puzzles, the less advantageous subtree enumeration becomes.
[..]
subtree/supertree enumeration wins in the tails of the distribution [...] which, perhaps in part due to the "exotic" nature of those low probability puzzles, have attracted enduring interest on this forum.
[..]
If you prefer to look only at the more ordinary mid-section of the distribution then [...] yep, suexg-cb may be all you need.

One uses the part of a distribution adapted to one's goals; it isn't a matter of "preference" or being "more ordinary". It's very funny to see how you use vocabulary with different connotations to speak of the two methods.
But I have no major disagreement with your above statements.

BTW, speaking of generation, suexg-cb is only one implementation of the controlled-bias idea/algorithm. There are now faster programs than suexg and potential improvements in speed - but "probably" not orders of magnitude.

by **dobrichev** » Wed Jun 12, 2013 9:01 pm

Red Ed wrote:I wonder if dobrichev, whose brief foray into subtree methods sparked this exchange, is feeling any better informed now?

Hi,
All the methods for estimation of all minimals are based on as larger as possible sample and incorporate weighting up to some predefined degree of details. Nothing new here, although claiming ownership on centuries old mathematical methods sounds a bit strange.

Playing with distribution of distributions is a little confusing to me. Determining the distribution of, say, 25-clue minimals over several samples is targeting the mean/median/average/whatever that finally will be included in the final distribution of the n-clue puzzles. We want the the individual n-clue probabilities to sum to 1 and I am not sure for the right way this should be done.

A typical usage of the general distribution - this over the whole space - is when you use it as a reference to underline something special = something non-average. Nevertheless it seems I posted in wrong topic, using the "real" distribution as reference. Sorry, Denis.

I hate dividing something close to zero by other thing close to zero. This limits the applicability of the general distribution. A possible workaround is doing local investigations targeting the proportion N(k) / N(k+1) and using the subtree as the only option. I have no idea if these local observations are reusable outside the same local context.

We know UA sets are relatively well defined abstraction which determines the grid and respectively its puzzles. We know that every solution grid has 6.67e21 (non-minimal) unavoidable sets. We know that for different grids the ratio between unaviodable rectangles and all UA is between 0 and 36 / 6.67e21. Aren't rectangles extremely rare? If so, why we don't just ignore them? Because of the significance of the weighting, and no subjections here. Denis?

I hope some day some will approximate this distribution by product of the yet unknown uniques and minimality functions.

Any low-clue puzzle kills the minimality of all its supersets which is huge number. Can we solve the opposite problem - estimate the low-clue (or high-clue) fertility of a given subgrid by processing the counts of the easily obtainable moderate-clue minimal puzzles?

For the hard puzzles hunters, below is my first observation related to the "real" distribution.

Hidden Text: Show

And yes, I am feeling better informed now, even if the exchange only repeats or reformulates already discussed matter. Thank you guys.

Website · by **denis_berthier** » Thu Jun 13, 2013 1:16 am

dobrichev wrote:All the methods for estimation of all minimals are based on as larger as possible sample and incorporate weighting up to some predefined degree of details. Nothing new here, although claiming ownership on centuries old mathematical methods sounds a bit strange.

As you are so well informed about centuries old mathematical methods, I would be very interested if you could give any mathematical reference of how to generate uncorrelated unbiased (or with known bias) samples?

dobrichev wrote:A typical usage of the general distribution - this over the whole space - is when you use it as a reference to underline something special = something non-average.

Of course, you can use a distribution to say that some situation is not typical, but this is not typical usage. Typical usage is to use the distribution where it is meaningful, i.e. where the weight is. The tail of the distribution is relevant to what is called "statistics of extremes".
In the past years, this forum has concentrated on generating the "hardest" puzzles that no real player will ever be able to solve, even using the exotic rules - in spite of champagne's constant propaganda to the contrary. Indeed, this forum has become a second programmer's forum because no one can really study these puzzles/rules manually. I don't mean they are not interesting, I just don't accept to call this typical. As you can see from the discussion, about 10 people in the world are interested in these topics.

dobrichev wrote:I hate dividing something close to zero by other thing close to zero. This limits the applicability of the general distribution. A possible workaround is doing local investigations targeting the proportion N(k) / N(k+1) and using the subtree as the only option. I have no idea if these local observations are reusable outside the same local context.

If you use the full subtree of minimals generated from a complete grid, my P(k)/P(k+1) formula remains valid; but not if you use the subtree generated from a non-complete grid.

dobrichev wrote:We know UA sets are relatively well defined abstraction which determines the grid and respectively its puzzles. We know that every solution grid has 6.67e21 (non-minimal) unavoidable sets. We know that for different grids the ratio between unaviodable rectangles and all UA is between 0 and 36 / 6.67e21. Aren't rectangles extremely rare? If so, why we don't just ignore them? Because of the significance of the weighting, and no subjections here. Denis?

My interest in Sudoku (and other logic puzzles) is in pattern-based resolution. The only goal of my short incursion in generation and statistics was to compute the distribution of the W rating. For this I had to define a new kind of generator (because Red Ed had suggested that our current generators might be biased), but this was not my main goal. As for UA sets, you're welcome to study them, but this is not my personal cup of tea.

dobrichev wrote:For the hard puzzles hunters, below is my first observation related to the "real" distribution.
Take the grid with most puzzles (294) in the recent version of champagne's the hardest collection.
123456789457189236689372154268794315391265478574813692732948561815627943946531827
[...]
There is anomaly in the number of clues distribution of these puzzles

There's no anomaly at all. The distribution of a random variable on a subset of the probability space is rarely the same as its distribution on the whole set.

by **dobrichev** » Thu Jun 13, 2013 4:51 am

Ilf and Petrov wrote:There were so many hairdressing establishments and funeral homes in the regional centre of N. that the inhabitants seemed to be born merely in order to have a shave, get their hair cut, freshen up their heads with toilet water and then die. In actual fact, people came into the world, shaved, and died rather rarely in the regional centre of N. Life in N. was extremely quiet. The spring evenings were delightful, the mud glistened like anthracite in the light of the moon, and all the young men of the town were so much in love with the secretary of the communal-service workers' local committee that she found difficulty in collecting their subscriptions.

Website · by **denis_berthier** » Thu Aug 01, 2013 7:40 am

Comparing the controlled-bias generator with the subset/superset method

For completeness reasons, as Red Ed's subset/superset method has been evoked earlier in this thread, and now that, thanks to Afmob's and Blue's calculations, we have detailed results for puzzles generated by it, I think necessary to report here the "efficiency" of that method when it comes to estimating the mean or the distribution of a random variable X of interest in Sudoku solving (X=SER, X=W rating, X="solvable by SSTS", X="has an sk-loop" ...).

Due to correlation of the puzzles generated by subset/superset, for the same relative error, the subset/superset method would require the computation of X for about 500 times more puzzles than with the controlled-bias generator
(see section 2 of this post http://forum.enjoysudoku.com/counting-minimal-puzzles-subsets-supersets-etc-t31190-147.html for details). This makes it totally unfit for such purposes*.

(*) but, if one is only interested in the number of clues of puzzles or the number of minimal puzzles with n clues, it is interesting for n "far" from the mean value.

The New Sudoku Players' Forum

The real distribution of minimal puzzles

Re: The real distribution of minimal puzzles

Re: The real distribution of minimal puzzles

Re: The real distribution of minimal puzzles

Re: The real distribution of minimal puzzles

Re: The real distribution of minimal puzzles