the data sizes require a bit more than brute force
it turns out that determining equivalence classes is equivalent to finding connected components in a graph,
and that is linear time complexity
what took so long is that for |G| = 41324 the {4}* ({-4+4} closure) defines a graph with 98,351,120 edges,
with a graph data representation (using at&t's graphviz tools) file size of 8.1Gi
linear or not, that size input can bog down a connected components implementation
with the right massaging and a home brew algorithm I was able to compute the closure in 1 min
40min to compute the {4}* data using my solver, 16min to massage, 1min to compute connected components
graphviz ccomps took 72min on the massaged data
you can generate the initial data with my solver version 2007-08-01
- Code: Select all
sudoku -goce{-4} -Fc'%05#in %#kc' g.dat > g.edges
(41324 input 17's produces 8.1Gi edge data -- change the 4 to 1,2,3 for different on/off edge data)
the graphviz ccomps command and my home brew agree for {1}*, but it would be good to get
independent corroboration on the equivalence class counts and sizes
here are the #classes/class-size for {1}* {2}* {3}* and {4}*:
- Code: Select all
=== 41324 {1}* ===
18950 1
4306 2
1095 3
645 4
259 5
170 6
97 7
59 8
47 9
30 10
26 11
18 12
16 13
8 14
3 15
9 16
5 17
5 18
4 19
4 20
1 21
3 22
1 23
5 24
5 25
2 27
1 28
1 30
2 31
2 32
1 33
1 37
1 38
2 41
1 42
1 50
1 55
1 87
1 173
1 560
1 616
=== 41324 {2}* ===
7734 1
2214 2
724 3
424 4
213 5
135 6
70 7
65 8
39 9
29 10
17 11
17 12
12 13
7 14
5 15
12 16
5 17
4 18
7 19
2 20
1 21
3 22
6 23
4 24
1 25
1 26
1 28
2 30
1 31
1 32
1 33
3 35
1 39
1 43
1 44
1 56
1 58
1 64
1 65
1 70
1 80
2 82
1 19182
=== 41324 {3}* ===
1205 1
240 2
62 3
38 4
23 5
16 6
5 7
1 8
2 9
1 10
2 11
1 12
1 13
1 22
1 38950
=== 41324 {4}* ===
12 1
3 2
1 41306
e.g., for {4}* there are 12 classes of size 1, 3 classes of size 2, and one class of size 41306
so what does this mean
first of all, the data posted above is empirical, based on the 17 catalog G with |G|=41324
the {4}* data shows that the catalog as it stands looks like it may be closed under {-4+4}
i.e., every puzzle in G may be within {-4+4} of every other puzzle
this could be proven by example by selecting a puzzle from the size 41306 class and
doing a {-4+4} closure and seeing if the other classes are absorbed (would that we could compute that expediently)
on the tractable side, it might mean that starting with puzzles in the other 15 classes might produce one or
more new 17's that bridge to the pangaea class
here are the puzzle indices and equivalence class label (lowest puzzle index in the class)
for the {4}* non-pangaea classes:
- Code: Select all
00506 00506
01467 01467
01835 01835
12703 12703
14037 14037
20069 20069
21276 21276
22575 22575
30563 30563
30564 30563
32043 32043
40609 40609
40610 40609
40611 40611
40734 40734
41077 41077
41214 41077
41100 41100
e.g., { 40609 40610 } are in the 40609 {4}* class and 41100 is in a singleton class