## Rating rules / Puzzles. Ordering the rules

Advanced methods and approaches for solving Sudoku puzzles
denis_berthier wrote:The application of a rule can make another one non applicable, but then a simpler rule withthe same conclusion will become applicable.

thanks for the clarification
you are basically providing an SE style rating -- the rating of the hardest move
for the above, is it possible to reach a state where more than one Ln rule must be applied in succession?
L0: only elementary constraints propagation (no puzzle can be solved here)

what about a valid 81-clue puzzle?
gsf
2014 Supporter

Posts: 7306
Joined: 21 September 2005
Location: NJ USA

denis_berthier wrote:
tarek wrote:
denis_berthier wrote:
tarek wrote:if any solver has a ceiling to what it can handle, that also should be mentioned for clarity.

How do you define the ceiling? Which rating do you use? That's the problem.
true, but at least you can tell what is above that ceiling

If you can't define a ceiling, how can you tell what's above?
This is not rhetoric. When I tried the nrczt-chains on Ruud's top10000, the classification results I obtained were uncorrelated with Ruud's classification.

To avoid going in circles. What I mean by ceiling is basically the limit of what a solver/method can solve where a puzzle is considered beyond that limit if it is not ratable by that method/technique/rule/solver. that info may help improve our understanding of how that solver/method works. It also can provide ideas for future updates (Like in Sudoku Explainer & gsf's q1 rating)

sometimes you can tell where the cut off exactly & sometimes it is vague but there are clear examples of puzzle that cannot be solved.

Therfore it helps to define the scope of (our search)/(our puzzle lists)

if we are talking about 1M of randomly generated puzzles how many cannot be solved by SS.

The puzzles mentioned in the HARDEST puzzles thread probably fall out of the 3 standard deviation ( or even 4 standard deviations) ....

I also have this question:

if a method can't solve a certain puzzle, should it be allowed to rate a puzzle that it can solve ?!!

tarek

tarek

Posts: 2644
Joined: 05 January 2006

gsf wrote:you are basically providing an SE style rating -- the rating of the hardest move

What I'm basically providing is a rating of the rules, not of the puzzles. It can be extended to a rating of the puzzles, in the SE style. But there are three major differences with SE:
- my set of rules is globally homogeneous,
- all my theories have the confluence property,
- this rating doesn't depend on implicit choices of the solver.
E.g., the set of rules in SE doesn't have the confluence property. The results of BUG may be made unavailable if other rules are applied before it.

gsf wrote:for the above, is it possible to reach a state where more than one Ln rule must be applied in succession?

Yes. You can see many examples of this in the solutions I've given in my other threads.

gsf wrote:
L0: only elementary constraints propagation (no puzzle can be solved here)

what about a valid 81-clue puzzle?

OK: no unsolved puzzle...
denis_berthier
2010 Supporter

Posts: 1254
Joined: 19 June 2007
Location: Paris

Tarek,
What my resolution rules can do has been detailed in my other threads. There are puzzles (less than 1 in 10.000 minimal puzzles) that it can't solve - unless I add T&E. I could extend my rating to puzzles that require T&E, but I can't see the use of this.
Another answer is: it can solve approximately the same puzzles as those programmed in Mike's solver. See the first post here: http://forum.enjoysudoku.com/viewtopic.php?t=5591&postdays=0&postorder=asc&start=105 and the discussion with Mike before it.
In terms of ER, the breakpoint seems to be around 9.2 - 9.3 - but this is not an absolute measure.
One puzzle it can't solve is EasterMonster. But I haven't yet seen a solution of EM by pure logic.

So, it is clear we need other rules if we want to solve all the puzzles. But the question of rating them is still there. It is likely that the rules we'll have to introduce will be more complex than the known ones - maybe relying on second order logic, as e.g. in Allan Barker's approach. The more we introduce complex rules, the more it becomes useful to classify them wrt known ones.

tarek wrote:if a method can't solve a certain puzzle, should it be allowed to rate a puzzle that it can solve ?!!

1) As no method can solve all the puzzles, unless it uses T&E, then no rating should be allowed ;)
2) Why should we be interested in rating puzzles that no human player can solve?
denis_berthier
2010 Supporter

Posts: 1254
Joined: 19 June 2007
Location: Paris

denis_berthier wrote:1) As no method can solve all the puzzles, unless it uses T&E, then no rating should be allowed ;)
2) Why should we be interested in rating puzzles that no human player can solve?

mathematic / engineering curiosity?
there is clearly a range of difficulty in the set of puzzles that currently require T&E
even though that difficulty is currently imperfectly quantized, does that mean it shouldn't be attempted?
gsf
2014 Supporter

Posts: 7306
Joined: 21 September 2005
Location: NJ USA

Thanx for clarifying the limits.

denis_berthier wrote:
tarek wrote:if a method can't solve a certain puzzle, should it be allowed to rate a puzzle that it can solve ?!!

1) As no method can solve all the puzzles, unless it uses T&E, then no rating should be allowed ;)
2) Why should we be interested in rating puzzles that no human player can solve?

Strong statements denis.
1) maybe that is true, unless T&E is an acceptable incorporation in the method to deal with non-T&E failed ratings.
2) I've seen Carcul's solution to #77 (one the previous toughest), that has an SE121 of 9.7.

This again brings us to a can of worms regarding T&E & which current methods are T&E.

tarek

tarek

Posts: 2644
Joined: 05 January 2006

denis_berthier wrote:

1) As no method can solve all the puzzles, unless it uses T&E, then no rating should be allowed ;)

Sorry Denis, I strongly claim that there is no T&E in my solver and that I crack all puzzles up to now.

The most difficult "Golden Nugget" solution has already been described in the full tagging thread and a better description is available here (more than 60% covered, the rest is coming), have a look and tell me where you see T&E

http://pagesperso-orange.fr/gpenet/UX/Sample7GN/V1.htm
champagne
2017 Supporter

Posts: 6116
Joined: 02 August 2007
Location: France Brittany

gsf wrote:
denis_berthier wrote:1) As no method can solve all the puzzles, unless it uses T&E, then no rating should be allowed ;)
2) Why should we be interested in rating puzzles that no human player can solve?

mathematic / engineering curiosity?
there is clearly a range of difficulty in the set of puzzles that currently require T&E
even though that difficulty is currently imperfectly quantized, does that mean it shouldn't be attempted?

I was joking. But anyway it seems useful to question the goals of our ratings. For a puzzle creator, e.g., only puzzles solvable by humans need be rated.
All the rates I've discussed above go beyond human limits. Most can rate all the puzzles. NRCZT could if I added something for T&E. But I don't know how to do this and keep all its good properties: purely logical definition, independence on the solver, invariance under symmetry. I conjecture any puzzle can be solved with T&E(NRCZT, 1), i.e. with depth 1 T&E pruned by the nrczt rules. I've shown it's true for EM. I could write such rates as NRCZT12+T&E+NRCZT14 or T&E(NRCZT14, 1).

But, for T&E, the question remains: on which general principles can we base a rate for puzzles that require it? In this respect, I think the SER rating is very misleading because it gives the impression that there is no discontinuity.
denis_berthier
2010 Supporter

Posts: 1254
Joined: 19 June 2007
Location: Paris

tarek wrote:...unless T&E is an acceptable incorporation in the method to deal with non-T&E failed ratings.

Same answer as to gsf: how can we take this discontinuity into account?
tarek wrote:. I've seen Carcul's solution to #77 (one the previous toughest), that has an SE121 of 9.7.

As Mike had pointed this solution to me, I've seen it also. Carcul uses very complex chains of subsets.
This leads me to a point I wanted to raise in this thread: how can we rate chains based on subsets, with different lenghts and different maximal sizes for the subsets they use? No a priori evaluation of the complexities seems possible (see the discussion with gsf and RedEd in this thread: http://forum.enjoysudoku.com/viewtopic.php?t=5600&postdays=0&postorder=asc&start=30)
Therefore, we can only compute statistical complexity on random samples.
For this, we'd need an a priori ordering of chains in function of these two parameters and then we'd run a solver (with all the other rules turned off) that can exploit this ordering. We could then see how the computation times vary with these parameters. As I use no chains with subsets, I can't do this with SudoRules.
I don't know if any of the available solvers can do this.
denis_berthier
2010 Supporter

Posts: 1254
Joined: 19 June 2007
Location: Paris

champagne wrote:I strongly claim that there is no T&E in my solver and that I crack all puzzles up to now.

The most difficult "Golden Nugget" solution has already been described in the full tagging thread and a better description is available here (more than 60% covered, the rest is coming), have a look and tell me where you see T&E

http://pagesperso-orange.fr/gpenet/UX/Sample7GN/V1.htm

We already had this discussion: you have rules for managing tags, not rules for dealing with candidates.
As far a as I understand it, your solver finds a "solution" in a few seconds and then you need a few months to extract something "readable" from it.
Your first "elimination" - which is not the elimination of any candidate - concludes that "(3&6)r4c4r5c5 not valid". But, for this, we have to go to this page: http://pagesperso-orange.fr/gpenet/UX/Sample7GN/V1DR_fichiers/D01.htm, where we fall back into ad hoc tag management. This approach is typical of what we do in T&E.
In any case, this is not what I'd call a solution in terms of predefined resolution rules.
As this debate is not really the topic of this thread, I won't say more.
denis_berthier
2010 Supporter

Posts: 1254
Joined: 19 June 2007
Location: Paris

As Champagne kindly provided a file with his computation times for the puzzles in sudogen0, I can now give the correlations of these times with the five ratings mentionned in the first post (as above, the computations are done on the first 1.000 puzzles).

Champagne-computation-time vs NRCZT-rating: 0.58
Champagne-computation-time vs SER-rating: 0.50
Champagne-computation-time vs SUEX9-rating: 0.58
Champagne-computation-time vs SUEXT-rating: 0.64
Champagne-computation-time vs GSF-Q1-rating: 0.40

All these correlations are weak. This shouldn't be a surprise as computation times are implementation dependent. These results mean that, for puzzles with the same rating (whichever is chosen as a reference), the variance in computation time due to the algorithm specificities may be almost as large as the differences due to the rating of the puzzle.

As a result, Champagne-computation-times are farther apart from the other five ratings. More precisely, the mean distances of each rating to the other five are now:
NRCZT-rating: 0.73
SER-rating: 0.79
SUEX9-rating: 0.73
SUEXT-rating: 0.75
GSF-Q1-rating: 0.85
CHAMP-rating: 0.96

It'd be interesting to compute the correlations of Champagne's 4 levels to the other ratings, although their grain is different, but I've'nt yet been able to extract this information from the file.
denis_berthier
2010 Supporter

Posts: 1254
Joined: 19 June 2007
Location: Paris

Denis wrote

We already had this discussion: you have rules for managing tags, not rules for dealing with candidates.

as a tag is a container for candidates or equivalent candidates groups of candiates...!!

Denis wrote
As far a as I understand it, your solver finds a "solution" in a few seconds and then you need a few months to extract something "readable" from it.

It is clear when you write such statement that you never worked on hardest puzzles.

I have always been clear on the fact that the raw print is for Golden Nugget something between 1000k (fist shot) and now 400K.

I publih something I have carefully checked and I do some additional personnal work to produce a print as simple as possible.

With a puzzlle coming out of your lot, it takes at most one or two minutes to check a puzzle. My brain is not perfect and when complexity is growing, I need more than the procesing time ratio (about 1 to 2000) to bring some added value to the computer work.

My first shot on Golden Nugget has been publshed mid January, I am still waiting for an alternative.

Denis wrote
Your first "elimination" - which is not the elimination of any candidate - concludes that "(3&6)r4c4r5c5 not valid".

I agree, ti's not a candidate, it is s "super candidate". That the "secret!!" of the capability to crack hardest puzzles.

Denis wrote
This approach is typical of what we do in T&E.

All steps in eliminations are "net of AICs" (by the way as any of your "T" chain). You never claimed to to something typical T&E.

I won't say more either
champagne
2017 Supporter

Posts: 6116
Joined: 02 August 2007
Location: France Brittany

champagne wrote:All steps in eliminations are "net of AICs" (by the way as any of your "T" chain).

That's the whole problem: your algorithm can't make any difference between a net and a chain (a linearly ordered sequence of candidates, with no branching) or, more generally, between a simple pattern and a complex one. It is therefore unable to prefer the simper ones. I see here the main reason why your computation times are uncorrelated with the usual ratings of the puzzles. Please don't take this personally.

To be more positive and to come back to the topic of this thread, if your basic concept is that of a "super-candidate" (AFAIK, a concept introduced in Steve's pattern), why don't you try to devise a rating based on the size of such super-candidates, the number of such super-candidates used in your nets, the branching factor of the nets,... ?
You may be surprised when you realise that computation times will drastically change if you want to guarantee that a puzzle is always solved with the simplest possible nets wrt to this rating.
denis_berthier
2010 Supporter

Posts: 1254
Joined: 19 June 2007
Location: Paris

denis wrote

To be more positive and to come back to the topic of this thread, if your basic concept is that of a "super-candidate" (AFAIK, a concept introduced in Steve's pattern), why don't you try to devise a rating based on the size of such super-candidates, the number of such super-candidates used in your nets, the branching factor of the nets,... ?
You may be surprised when you realise that computation times will drastically change if you want to guarantee that a puzzle is always solved with the simplest possible nets wrt to this rating.

1)Super candidates are used when other process fail. It is out of the scope of that thread.

2)Super candidates are limited to AALS AC2 for the time being

3)Progressivity in difficulty of chains is done thru levels.

I guess the low correlation could also come, in these very simple puzzles form the fact that other process are depending on the lenght of the chains.

In full tagging, you pay to see all chains at once. You have to pay more (adding ALS or looking for kraken blossoms for example) if nothing appear. But then again, you pay a fix amount to see all in once.

The last parameter influencing the time is the number of times you have to restart looking for singles.
champagne
2017 Supporter

Posts: 6116
Joined: 02 August 2007
Location: France Brittany

I would comment on the sample of Denis.
Again, a summary of my results.

Code: Select all
`no tagging  6851   16  0;47    (number, av time, min time, max time all milliseconds)level 0      828   30  15;78   (only bi values on digits including groups)level 1     2140   61  15,234  (same as above plus bi values in cells) sub total   9819level 1+     169  355  78;1641 (ternary kraken blossoms)level 2       12 1977  406;4406(ALS ACs no loop on derived weak links)`

Out of processing time, I would stress on the fact that 68% of that lot is solved on my side without using any chain.
98% of the total are not requesting more than "linear AICs" based on bi values (including groups)

Is it really a topic to rate that kind of puzzles thru sophisticated tools??
Existing ranking based on the most difficult tool used, number of cycles .... seems good for puzzle sellers.
Providers of hard puzzles are looking for something specific, but then we have to use other sample files.

By the way, as I claim that full tagging brings something specific, I would feel confortabe with processing time using full tagging not correlated with other ranking tools.
champagne
2017 Supporter

Posts: 6116
Joined: 02 August 2007
Location: France Brittany

PreviousNext