Notation
X (bold caps) : a random variable.
X (normal caps) : a set, except for E(.) and Var(.) which are expectation and variance respectively.
x (lowercase) : anything else.
Σx∈X f(x) : sum of f(x) for all x in X.
ΣX∋x f(X) : sum of f(X) for X that contain x.
X ≡ Y : X and Y are equal by definition.
Generic method
Let M be some huge space of things, some part of which we wish to count.
Let φ be a function that computes the "value" of any element m∈M: value of m = φ(m).
We'd like to know the total "value" of M, that is: Σm∈M φ(m).
Let A be a random variable that picks subsets of M, with the special property that for any m∈M with φ(m)!=0 we can easily work out Pr(m), the (non-zero) probability that m is among the subset of things picked randomly by A: Pr(m) ≡ ΣA∋m Pr(A=A).
Let V≡V(A), a transformation of A, be the random variable defined by a weighted sum of things picked out by A: V ≡ Σm∈A φ(m)/Pr(m).
Then the expected value of V equals the total "value" of M: E(V) = Σm∈M φ(m).
Proof that E(V) = Σm∈M φ(m)
By definition E(V) is the sum, over all possible subsets, A, of V(A) times the probability that A=A.
That is: E(V) = ΣA Pr(A=A) Σm∈A φ(m)/Pr(m) (***).
Focus on a particular m, contained by sets A_1,...,A_k. So Pr(m) = Pr(A=A_1) + ... + Pr(A=A_k).
Then φ(m) appears in these terms of (***):
+ Pr(A=A_1) φ(m) / Pr(m)
+ ...
+ Pr(A=A_k) φ(m) / Pr(m)
which adds up to just φ(m).
So every m contributes φ(m) to E(V). In other words E(V) = Σm∈M φ(m), as claimed.
Examples
- Counting the number of 27-clue minimal puzzles by singleton probing
Replace 27 by c throughout for the generic c-singletons method.
M = all 27-clue subsets of solution grids.
φ(m) = 1 if m is a minimal puzzle, 0 otherwise.
Σm∈M φ(m) = the total number of 27-clue minimals.
A = "pick, uniformly at random, a 27-clue subset of a random solution grid". If m is a 27-clue minimal then Pr(m) = 1 / (choose(81,27) * number_of_solution_grids) = 27!54! / 81!n, where n = that 6.67e21 number.
V = 81!n / 27!54! if we picked a 27-clue minimal, 0 otherwise.
27-clue minimals by the "subsets method" with s=30
Replace 27 and 30 by c and s throughout for the generic (c,s)-subsets method.
Setting s=c gives the c-singletons method.
M = all 27-clue subsets of solution grids.
φ(m) = 1 if m is a minimal puzzle, 0 otherwise.
Σm∈M φ(m) = the total number of 27-clue minimals.
A = "pick all 27-clue minimals within a uniformly-randomly-chosen 30-clue subset of a random solution grid". If m is a 27-clue minimal then Pr(m) = choose(81-27,30-27) / (choose(81,30) * number_of_solution_grids) = (81-27)!30! / 81!(30-27)!n, where n = that 6.67e21 number.
V = 81!(30-27)!n / 30!(81-27)! times the number of 27-clue minimals we found.
27-clue minimals by the "supersets method" with s=24
Replace 27 and 24 by c and s throughout for the generic (c,s)-supersets method.
Setting s=c gives the c-singletons method.
M = all 27-clue subsets of solution grids.
φ(m) = 1 if m is a minimal puzzle, 0 otherwise.
Σm∈M φ(m) = the total number of 27-clue minimals.
A = "pick all 27-clue minimals containing a uniformly-randomly-chosen 24-clue subset of a random solution grid". If m is a 27-clue minimal then Pr(m) = choose(27,24) / (choose(81,24) * number_of_solution_grids) = 27!(81-24!) / 81!(27-24)!n, where n = that 6.67e21 number.
V = 81!(27-24)!n / 27!(81-24)! times the number of 27-clue minimals we found.
Collect k, a large number of, (independent) samples of V. Let W be the average of those V. By the central limit theorem, W is approximately normal with mean (the total "value" we sought to compute) E(V) and variance (the uncertainty(*) in the estimate) Var(V)/k. You can estimate E(V) and Var(V) from the data: E(V) is the mean of the observations; Var(V) is the sample variance of the observations.
(*) we usually quote the relative error, sd(V)/E(V) = 1/sqrt(k) * sqrt(Var(V))/E(V). Relative error decays per the square root of the number of trials.
Choice of parameters for the puzzle-counting problem
In the puzzle-counting problem, we want to estimate the number of c-clue minimal puzzles for all c in some range. Subsets works well for small c and supersets works well for large c. The appropriate choice of method, and of s, depends on the c-range of interest.
In this section, we will demonstrate the power of the subsets method for c<30 compared to the path-based "controlled bias" method(*). We use an implementation that wraps the strategy (subsets/paths) around the same core solver and that takes input from a fixed set of 1000 input grids that are read repeatedly over a ten minute period. A good strategy is one that exhibits low estimated relative errors when the time is up.
After ten minutes, the path-based solver reported:
- Code: Select all
+----+----------+--------------+------------+
| Cl | Count | E(nr/grid) | E(rel err) |
+----+----------+--------------+------------+
| 23 | 2 | 8.8918e+012 | 70.71% |
| 24 | 8 | 8.5954e+013 | 35.36% |
| 25 | 29 | 7.1041e+014 | 18.57% |
| 26 | 21 | 1.1080e+015 | 21.82% |
| 27 | 14 | 1.5047e+015 | 26.73% |
| 28 | 7 | 1.4510e+015 | 37.80% |
+----+----------+--------------+------------+
- Code: Select all
+----+----------+--------------+------------+
| Cl | Count | E(nr/grid) | E(rel err) |
+----+----------+--------------+------------+ Linear probing
| 20 | 20 | 4.4984e+006 | 36.06% | E(rel err) for
| 21 | 1124 | 1.5421e+009 | 11.55% | comparison ...
| 22 | 19257 | 1.7614e+011 | 5.42% |
| 23 | 106227 | 7.1658e+012 | 2.80% | 70.71%
| 24 | 190937 | 1.0672e+014 | 1.94% | 35.36%
| 25 | 119587 | 6.3499e+014 | 1.73% | 18.57%
| 26 | 25332 | 1.5065e+015 | 2.15% | 21.82%
| 27 | 1797 | 1.4694e+015 | 4.70% | 26.73%
| 28 | 41 | 6.0348e+014 | 19.66% | 37.80%
| 29 | 2 | 7.8010e+014 | 70.71% |
+----+----------+--------------+------------+
To see how much faster the subsets method is than the controlled-biased method for this c-range, divide the squares of the relative errors. For c=28 clues, for example, subtree probing's around 3 or 4 times faster. At c=25 clues, it looks like subtree enumeration's about 100 times faster. At c=22 clues, extrapolating back from the November 2009 controlled-bias results, subtree numeration is about 3500 times faster than linear probing.
Can we do even better at the low end of the c range? Yes, we can. Picking s=28 instead of s=30, we get an expected relative error for the number of 20-clue minimals of ~672% / sqrt(time_in_seconds) on the test machine. Within a week of computation, the estimate would be below 1% relative error -- an accuracy that would take centuries to achieve using the controlled-bias method on the same platform.
(*) which, to be fair, has uses besides the counting problem described here
Parting remarks
The subsets and supersets methods were described originally back in 2009, in the context of counting minimal puzzles, together with lots of detail on optimal choice of parameters, sensitivity to input source, and so on, but almost all of that was lost in the server crash. The ten-minute test is new (a 2013 wrapper around 2009 code).