The New Sudoku Players' Forum

by **Red Ed** » Mon Jun 10, 2013 10:57 pm

As requested, a summary of subsets, supersets and related methods for counting minimal puzzles.

Notation

X (bold caps) : a random variable.
X (normal caps) : a set, except for E(.) and Var(.) which are expectation and variance respectively.
x (lowercase) : anything else.
Σx∈X f(x) : sum of f(x) for all x in X.
ΣX∋x f(X) : sum of f(X) for X that contain x.
X ≡ Y : X and Y are equal by definition.

Generic method

Let M be some huge space of things, some part of which we wish to count.
Let φ be a function that computes the "value" of any element m∈M: value of m = φ(m).
We'd like to know the total "value" of M, that is: Σm∈M φ(m).

Let A be a random variable that picks subsets of M, with the special property that for any m∈M with φ(m)!=0 we can easily work out Pr(m), the (non-zero) probability that m is among the subset of things picked randomly by A: Pr(m) ≡ ΣA∋m Pr(A=A).

Let V≡V(A), a transformation of A, be the random variable defined by a weighted sum of things picked out by A: V ≡ Σm∈A φ(m)/Pr(m).

Then the expected value of V equals the total "value" of M: E(V) = Σm∈M φ(m).

Proof that E(V) = Σm∈M φ(m)

By definition E(V) is the sum, over all possible subsets, A, of V(A) times the probability that A=A.
That is: E(V) = ΣA Pr(A=A) Σm∈A φ(m)/Pr(m) (***).

Focus on a particular m, contained by sets A_1,...,A_k. So Pr(m) = Pr(A=A_1) + ... + Pr(A=A_k).
Then φ(m) appears in these terms of (***):
+ Pr(A=A_1) φ(m) / Pr(m)
+ ...
+ Pr(A=A_k) φ(m) / Pr(m)
which adds up to just φ(m).

So every m contributes φ(m) to E(V). In other words E(V) = Σm∈M φ(m), as claimed.

Examples

Counting the number of 27-clue minimal puzzles by singleton probing

Replace 27 by c throughout for the generic c-singletons method.

A

V

27-clue minimals by the "subsets method" with s=30

Replace 27 and 30 by c and s throughout for the generic (c,s)-subsets method.
Setting s=c gives the c-singletons method.

A

V

27-clue minimals by the "supersets method" with s=24

Replace 27 and 24 by c and s throughout for the generic (c,s)-supersets method.
Setting s=c gives the c-singletons method.

A

V

Computing the estimate

Collect k, a large number of, (independent) samples of V. Let W be the average of those V. By the central limit theorem, W is approximately normal with mean (the total "value" we sought to compute) E(V) and variance (the uncertainty(*) in the estimate) Var(V)/k. You can estimate E(V) and Var(V) from the data: E(V) is the mean of the observations; Var(V) is the sample variance of the observations.

(*) we usually quote the relative error, sd(V)/E(V) = 1/sqrt(k) * sqrt(Var(V))/E(V). Relative error decays per the square root of the number of trials.

Choice of parameters for the puzzle-counting problem

In the puzzle-counting problem, we want to estimate the number of c-clue minimal puzzles for all c in some range. Subsets works well for small c and supersets works well for large c. The appropriate choice of method, and of s, depends on the c-range of interest.

In this section, we will demonstrate the power of the subsets method for c<30 compared to the path-based "controlled bias" method(*). We use an implementation that wraps the strategy (subsets/paths) around the same core solver and that takes input from a fixed set of 1000 input grids that are read repeatedly over a ten minute period. A good strategy is one that exhibits low estimated relative errors when the time is up.

After ten minutes, the path-based solver reported:

Code: Select all: +----+----------+--------------+------------+ | Cl | Count | E(nr/grid) | E(rel err) | +----+----------+--------------+------------+ | 23 | 2 | 8.8918e+012 | 70.71% | | 24 | 8 | 8.5954e+013 | 35.36% | | 25 | 29 | 7.1041e+014 | 18.57% | | 26 | 21 | 1.1080e+015 | 21.82% | | 27 | 14 | 1.5047e+015 | 26.73% | | 28 | 7 | 1.4510e+015 | 37.80% | +----+----------+--------------+------------+

and the subsets method reported:

Code: Select all: +----+----------+--------------+------------+ | Cl | Count | E(nr/grid) | E(rel err) | +----+----------+--------------+------------+ Linear probing | 20 | 20 | 4.4984e+006 | 36.06% | E(rel err) for | 21 | 1124 | 1.5421e+009 | 11.55% | comparison ... | 22 | 19257 | 1.7614e+011 | 5.42% | | 23 | 106227 | 7.1658e+012 | 2.80% | 70.71% | 24 | 190937 | 1.0672e+014 | 1.94% | 35.36% | 25 | 119587 | 6.3499e+014 | 1.73% | 18.57% | 26 | 25332 | 1.5065e+015 | 2.15% | 21.82% | 27 | 1797 | 1.4694e+015 | 4.70% | 26.73% | 28 | 41 | 6.0348e+014 | 19.66% | 37.80% | 29 | 2 | 7.8010e+014 | 70.71% | +----+----------+--------------+------------+

To see how much faster the subsets method is than the controlled-biased method for this c-range, divide the squares of the relative errors. For c=28 clues, for example, subtree probing's around 3 or 4 times faster. At c=25 clues, it looks like subtree enumeration's about 100 times faster. At c=22 clues, extrapolating back from the November 2009 controlled-bias results, subtree numeration is about 3500 times faster than linear probing.

Can we do even better at the low end of the c range? Yes, we can. Picking s=28 instead of s=30, we get an expected relative error for the number of 20-clue minimals of ~672% / sqrt(time_in_seconds) on the test machine. Within a week of computation, the estimate would be below 1% relative error -- an accuracy that would take centuries to achieve using the controlled-bias method on the same platform.

(*) which, to be fair, has uses besides the counting problem described here

Parting remarks

The subsets and supersets methods were described originally back in 2009, in the context of counting minimal puzzles, together with lots of detail on optimal choice of parameters, sensitivity to input source, and so on, but almost all of that was lost in the server crash. The ten-minute test is new (a 2013 wrapper around 2009 code).

by **Red Ed** » Tue Jun 11, 2013 6:42 pm

Questions welcome. First one to spot the inevitable typo wins a virtual mars bar.

by **blue** » Wed Jun 12, 2013 3:09 am

Hello Ed,

This is a very nice description of the method(s).
I especially liked the use of the single random variable, V.

It might be nice to show "step by step", why it is that E(V) is the same as Σm∈M φ(m)
-- not necessarily in the first post.

There is/was one problem initially -- what the Var(V)/k value "is".
I just fixed a similar problem in my last post -- it's sqrt(Var(V)/k) that's the error estimate.

Best Regards,
Blue.

by **Red Ed** » Wed Jun 12, 2013 8:01 am

Thanks for your comments, Blue. I've updated the post accordingly.

Website · by **denis_berthier** » Wed Jun 12, 2013 8:20 am

Red Ed wrote:Let A be a random variable that picks a subset of M, with the special property that for any m∈M with φ(m)!=0 we can easily work out Pr(m), the probability that m is among the subset of things picked randomly by A.

As I think a definition should be unambiguously understandable without reading further on or trying to guess, I'm not satisfied with Pr(m).
Is it Pr(m / A) the conditional probability of m given A?
Or is it the probability that m be picked by some A - which supposes there is a global probability distribution on M?
I insist that I didn't read further on, so please don't refer me to the examples.

Red Ed wrote:Let V be the random variable defined by a weighted sum of things picked out by A: V = Σm∈A φ(m)/Pr(m)

As V depends on A, I would have written V(A). A is the original random variable, V is a function of A.
For a weighted sum, I would have expected V(A) = Σm∈A φ(m) * Pr(m or m/A)

Red Ed wrote:These methods were described originally back in 2009, in the context of counting minimal puzzles

This is an obvious falsification of history. If the "subset method" had been described four years ago, how could anyone believe you didn't run it 10 minutes, instead of speaking only of the "superset method" and admitting that

Red Ed - Posted: Wed Oct 14, 2009 6:59 am p. 37 of the pdf files - wrote:On the parts of the distribution that we're each focussing on, it seems we each have a method tuned for the purpose: my supersets method has no hope of beating yours below 30 clues (nor does it try to); and your paths method has no hope of beating mine above 30 clues (nor do you proclaim any interest in doing).

As for the result itself, comparing computation times for a method that generates correlated puzzles with one that generates uncorrelated ones is a clear nonsense.
Why don't you state the number of puzzles you would need to compute the mean value of some function, e.g. the mean SER?

by **Red Ed** » Wed Jun 12, 2013 5:07 pm

denis_berthier wrote:I insist that I didn't read further on, so please don't refer me to the examples.

You might not get far if you stop at the first tricky bit, so I would ask that you persist past that hurdle and try to understand the post as a whole. I've made some updates that I hope might address some of your specific concerns and make the rest of it an easier read.

For a weighted sum, I would have expected V(A) = Σm∈A φ(m) * Pr(m or m/A)

In this case, the weights are 1/Probability. Surprising, yes, but on this occasion not a typo.

Website · by **denis_berthier** » Wed Jun 12, 2013 5:31 pm

Red Ed wrote:You might not get far if you stop at the first tricky bit

There was no tricky bit, only bad definitions. I'm not enough interested to do your job of writing it properly if you don't care to make the effort.

by **Red Ed** » Wed Jun 12, 2013 5:40 pm

denis_berthier wrote:There was no tricky bit, only bad definitions. I'm not enough interested to do your job of writing it properly if you don't care to make the effort.

I've made updates in response to your comments on an article that you wished be written. I would be grateful if you'd put at least some effort into reading it and offering your thoughts on the methodology that it describes (not only on typography). Thank you.

Website · by **denis_berthier** » Wed Jun 12, 2013 7:29 pm

Red Ed wrote:
denis_berthier wrote:There was no tricky bit, only bad definitions. I'm not enough interested to do your job of writing it properly if you don't care to make the effort.

I've made updates in response to your comments on an article that you wished be written. I would be grateful if you'd put at least some effort into reading it and offering your thoughts on the methodology that it describes (not only on typography). Thank you.

If you think that the corrections you made at my request are only typography, that's hopeless.

My thoughts on the applicability of your algorithm were already stated here: http://forum.enjoysudoku.com/the-real-distribution-of-minimal-puzzles-t30127-59.html

From your sentence "the special property that for any m∈M with φ(m)!=0 we can easily work out Pr(m), the probability that m is among the subset of things picked randomly by A", I see that you have finally understood that you don't have to define the probability for all the m. See your objections about my defining a probability on only the set of minimal puzzles, near the end of this page: http://forum.enjoysudoku.com/extra/RDMP-page1.pdf.

by **Red Ed** » Wed Jun 12, 2013 8:35 pm

Yes, Denis, typography. Well, typography and a mistaken assumption about weighted sums.

From your sentence "the special property that for any m∈M with φ(m)!=0 we can easily work out Pr(m), the probability that m is among the subset of things picked randomly by A", I see that you have finally understood that you don't have to define the probability for all the m. See your objections about my defining a probability on only the set of minimal puzzles, near the end of this page: http://forum.enjoysudoku.com/extra/RDMP-page1.pdf.

No, false analogy. We're talking on this thread about a technique quite different to the flawed methodology on RDMP page 1:

BEWARE:
- the model described here neglects some essential aspects of the workings of a
"classical" top-down generator;
- the initial idea of using a biased generator with correction factors remains, but
is has to be applied to a modified top-down generator, which I called a
controlled-bias generator;
- this post is thus superseded by the new model for these generators, here

Website · by **denis_berthier** » Thu Jun 13, 2013 12:35 am

Red Ed wrote:Yes, Denis, typography.

First version: "Pr(m), the probability that m is among the subset of things picked randomly by A" - totally meaningless.
Corrected version: "Pr(m) ≡ ΣA∋m Pr(A=A)"
If you consider these are the same things, I see there's still no way of having a serious discussion with you and your politician's tricks; I don't see any reason of loosing more time with all this.

Red Ed wrote:
From your sentence "the special property that for any m∈M with φ(m)!=0 we can easily work out Pr(m), the probability that m is among the subset of things picked randomly by A", I see that you have finally understood that you don't have to define the probability for all the m. See your objections about my defining a probability on only the set of minimal puzzles, near the end of this page: http://forum.enjoysudoku.com/extra/RDMP-page1.pdf.
No, false analogy. We're talking on this thread about a technique quite different to the flawed methodology on RDMP page 1:
BEWARE:
- the model described here neglects some essential aspects of the workings of a
"classical" top-down generator;
- the initial idea of using a biased generator with correction factors remains, but
is has to be applied to a modified top-down generator, which I called a
controlled-bias generator;
- this post is thus superseded by the new model for these generators, here

As I explained, my initial error was a due to a misunderstanding of the workings of top-down generators. I had never before worked on generation and suexg code is awful. I had made an error, I said it clearly instead of trying to hide it as you would probably have done by calling it typography, and I corrected it. But you didn't change your critics after I had properly defined the controlled-bias generator.

In all my old thread, you've been totally negative, your only goal being to claim anteriority and to claim your "method" was better. The result we know is it's only marginally better, for the particular case of the number of clues and on a set of probability close to 0, as you write it yourself: http://forum.enjoysudoku.com/the-real-distribution-of-minimal-puzzles-t30127-59.html

by **blue** » Thu Jun 13, 2013 4:30 am

denis_berthier wrote:If you consider these are the same things, I see there's still no way of having a serious discussion with you and your politician's tricks; I don't see any reason of loosing more time with all this.

Politician's trick #1: Accuse the opposition of doing what you're doing.

Looking at post times and "last edit" times, and I see this:

Red Ed (Wed Jun 12, 2013 5:47 pm) wrote:Let A be a random variable that picks subsets of M, with the special property that for any m∈M with φ(m)!=0 we can easily work out Pr(m), the probability that m is among the subset of things picked randomly by A: Pr(m) ≡ ΣA∋m Pr(A=A).

denis_berthier (Thu Jun 13, 2013 12:35 am) wrote:First version: Pr(m), the probability that m is among the subset of things picked randomly by A - totally meaningless.
Corrected version: Pr(m) ≡ ΣA∋m Pr(A=A)

Denis: your bold text doesn't match Ed's.

denis_berthier (Wed Jun 12, 2013 8:20 am) wrote:As I think a definition should be unambiguously understandable without reading further on or trying to guess, I'm not satisfied with Pr(m).
Is it Pr(m / A) the conditional probability of m given A?
Or is it the probability that m be picked by some A - which supposes there is a global probability distribution on M?

From the first edition of the first post, we have: "Let A be a random variable that picks subsets of M". It's clear from that, that associated with A, is probablity distribution on P(M), pow(M) ... label it how you like, it's the set/class of subsets of M.

I agree with Denis, that "Pr(m), the probability that m is among the subset of things picked randomly by A" is (slightly) ambiguous.
I would hardly call it meaningless.
I didn't have any trouble understanding it at least -- A picks a set, a subset of M ... what's the probability that m (a member of M) is "in it" ? (Note: you're not told which set A has picked, or whether a pick has even been made).

In another attempt at peacemaking, may I suggest this wording: "Pr(m), the probablity that A picks a set containing m".
By definition, this is "ΣA∈S(m) Pr(A=A)", with S(m) = { A∈pow(M) | m∈A }.
"Pr(m) ≡ ΣA∋m Pr(A=A)" is a perfectly fine way of writing that.

Best Regards,
Blue.

P.S. My experience with statistics, comes from the practical side, not the theoretical side.
The notation Pr(A=A) is unfamiliar to me -- Pr[A](A) is how I might write it.
I'm assuming we don't have any problem with the idea that Pr(A=A), means the probability that (on any given pick), A will be picked by A.

Website · by **denis_berthier** » Thu Jun 13, 2013 5:52 am

blue wrote:
Red Ed (Wed Jun 12, 2013 5:47 pm) wrote:Let A be a random variable that picks subsets of M, with the special property that for any m∈M with φ(m)!=0 we can easily work out Pr(m), the probability that m is among the subset of things picked randomly by A: Pr(m) ≡ ΣA∋m Pr(A=A).

denis_berthier (Thu Jun 13, 2013 12:35 am) wrote:First version: Pr(m), the probability that m is among the subset of things picked randomly by A - totally meaningless.
Corrected version: Pr(m) ≡ ΣA∋m Pr(A=A)

Denis: your bold text doesn't match Ed's.

You are perfectly right: as I inserted the formula by copy-paste, I lost the bold A inside Pr.
As for the outer bold A, I wanted to emphasize that in Red Ed's new writing the sum is understood as being over A. Nobody could seriously pretend that the Pr(m) in the first version should obviously be given this interpretation.
I would have no problem with writing Pr(m) ≡ ΣA∋m Pr(A=A) - if the probability on P(M) had been defined.

blue wrote:From the first edition of the first post, we have: "Let A be a random variable that picks subsets of M". It's clear from that, that associated with A, is probablity distribution on P(M), pow(M)

Yes, so this should be stated from the start and the probability on this space should be defined before speaking of A. We have to guess that the probability of a subset of M is related to its cardinal (or is this also a false assumption?), but this is not written.

BTW, there are standard, worldwide accepted names for the concepts of probability:
- a process can pick something, a random variable doesn't pick anything;
- a random variable is defined on some measurable space (here the discrete P(M)) with values in some other space (here R); A is a real random variable on P(M);
- V(A) is not a "transformation" of A, but a function of A.
Changing the standard vocabulary doesn't help understanding.

blue wrote:The notation Pr(A=A) is unfamiliar to me.
I'm assuming we don't have any problem with the idea that Pr(A=A), means the probability that (on any given pick), A will be picked by A.

There's no problem with Pr(A=A), except maybe the typographical one of using the same bold/normal letters to notate a variable and its values.
Pr(A=A) is exactly what it reads: the probability that A = A.

by **blue** » Thu Jun 13, 2013 6:59 am

denis_berthier wrote:You are perfectly right: as I inserted the formula by copy-paste, I lost the bold A inside Pr.

I would recommend using the "quote" feature, and doing the copy from the edit window.

denis_berthier wrote:We have to guess that the probability of a subset of M is related to its cardinal (or is this also a false assumption?), but this is not written.

It's an unnecessary assumption, and also false for the examples.

denis_berthier wrote:BTW, there are standard, worldwide accepted names for the concepts of probability:
(...)

Thank you.

denis_berthier wrote:- a random variable is defined on some measurable space (here the discrete P(M)) with values in some other space (here R); A is a real random variable on P(M);

Are you suggesting that a random variable is a probablity distribution, and nothing more ?
It's curious to see you saying that A is "real (valued)", since its "outputs" are sets ... when we "sample" A, we get a set.
I would think that it's a "P(M) valued" random variable.
I'll defer to the pros, of course.
Out of curosity, what else (besides "rational" ?), could take the place of "real" here ?

denis_berthier wrote:
blue wrote:The notation Pr(A=A) is unfamiliar to me.
(...)

(...)
Pr(A=A) is exactly what it reads: the probability that A = A.

The problem I would have with it, is that A is a subset of M, and (bold) A is not -- "how could the two be equal ?"
[ Well, that and that Pr(.) would looks like a function that takes an equation for input -- not that that's unreasonable. ]
If it's standard notation, so be it.

Best Regards,
Blue.

by **Red Ed** » Thu Jun 13, 2013 7:11 am

denis_berthier wrote:we know [that subtree/supertree methods are] only marginally better, for the particular case of the number of clues and on a set of probability close to 0

I will quantify this with some tests over the next few days. It would be a nice update to the parameter-setting section of the first post. I'll use bbsolve for the core solving engine and will pre-generate a set of, say, 1000 puzzles to run the test on. I'll test the strategy of starting at s clues and removing clues to reach a minimal puzzle, either with (subtree method) or without (singleton method / controlled bias) branching, and will the track rate at which the relative error decays for the estimates of the number of c-clue puzzles for all c<s+1. I'll do further checks on an independently generated set of 1000 puzzles as time permits.

The New Sudoku Players' Forum

Counting minimal puzzles: subsets, supersets, etc.

Counting minimal puzzles: subsets, supersets, etc.

Re: Counting minimal puzzles: subsets, supersets, etc.

Re: Counting minimal puzzles: subsets, supersets, etc.

Re: Counting minimal puzzles: subsets, supersets, etc.

Re: Counting minimal puzzles: subsets, supersets, etc.

Re: Counting minimal puzzles: subsets, supersets, etc.

Re: Counting minimal puzzles: subsets, supersets, etc.

Re: Counting minimal puzzles: subsets, supersets, etc.

Re: Counting minimal puzzles: subsets, supersets, etc.

Re: Counting minimal puzzles: subsets, supersets, etc.

Re: Counting minimal puzzles: subsets, supersets, etc.

Re: Counting minimal puzzles: subsets, supersets, etc.

Re: Counting minimal puzzles: subsets, supersets, etc.

Re: Counting minimal puzzles: subsets, supersets, etc.

Re: Counting minimal puzzles: subsets, supersets, etc.