dobrichev wrote:All the methods for estimation of all minimals are based on as larger as possible sample and incorporate weighting up to some predefined degree of details. Nothing new here, although claiming ownership on centuries old mathematical methods sounds a bit strange.

As you are so well informed about centuries old mathematical methods, I would be very interested if you could give any mathematical reference of how to generate uncorrelated unbiased (or with known bias) samples?

dobrichev wrote:A typical usage of the general distribution - this over the whole space - is when you use it as a reference to underline something special = something non-average.

Of course, you can use a distribution to say that some situation is not typical, but this is not typical usage. Typical usage is to use the distribution where it is meaningful, i.e. where the weight is. The tail of the distribution is relevant to what is called "statistics of extremes".

In the past years, this forum has concentrated on generating the "hardest" puzzles that no real player will ever be able to solve, even using the exotic rules - in spite of champagne's constant propaganda to the contrary. Indeed, this forum has become a second programmer's forum because no one can really study these puzzles/rules manually. I don't mean they are not interesting, I just don't accept to call this typical. As you can see from the discussion, about 10 people in the world are interested in these topics.

dobrichev wrote:I hate dividing something close to zero by other thing close to zero. This limits the applicability of the general distribution. A possible workaround is doing local investigations targeting the proportion N(k) / N(k+1) and using the subtree as the only option. I have no idea if these local observations are reusable outside the same local context.

If you use the full subtree of minimals generated from a complete grid, my P(k)/P(k+1) formula remains valid; but not if you use the subtree generated from a non-complete grid.

dobrichev wrote:We know UA sets are relatively well defined abstraction which determines the grid and respectively its puzzles. We know that every solution grid has 6.67e21 (non-minimal) unavoidable sets. We know that for different grids the ratio between unaviodable rectangles and all UA is between 0 and 36 / 6.67e21. Aren't rectangles extremely rare? If so, why we don't just ignore them? Because of the significance of the weighting, and no subjections here. Denis?

My interest in Sudoku (and other logic puzzles) is in pattern-based resolution. The only goal of my short incursion in generation and statistics was to compute the distribution of the W rating. For this I had to define a new kind of generator (because Red Ed had suggested that our current generators might be biased), but this was not my main goal. As for UA sets, you're welcome to study them, but this is not my personal cup of tea.

dobrichev wrote:For the hard puzzles hunters, below is my first observation related to the "real" distribution.

Take the grid with most puzzles (294) in the recent version of champagne's the hardest collection.

123456789457189236689372154268794315391265478574813692732948561815627943946531827

[...]

There is anomaly in the number of clues distribution of these puzzles

There's no anomaly at all. The distribution of a random variable on a subset of the probability space is rarely the same as its distribution on the whole set.