Puzzles Usage in a Presentation of a New Rating of Puzzles

Everything about Sudoku that doesn't fit in one of the other sections

Re: Some Puzzles Usage in my hardest sudoku list

Postby denis_berthier » Sat Feb 21, 2015 4:31 am

tknottet wrote:As for the report, do you recommend to create a new topic?
I'm afraid that the topic name is strange for these discussion.

I think you could continue in the same thread.
You could merely edit your first post to change the title.
Can I suggest something like: a new rating of puzzles and candidates (~mean difficulty of T&E)
denis_berthier
2010 Supporter
 
Posts: 3970
Joined: 19 June 2007
Location: Paris

Re: Some Puzzles Usage in my hardest sudoku list

Postby tknottet » Sat Feb 21, 2015 6:40 am

denis_berthier, thank you very much for your kind advice.
denis_berthier wrote:You could merely edit your first post to change the title.
Can I suggest something like: a new rating of puzzles and candidates (~mean difficulty of T&E)

I did what you say in my way under the limitation of title length.
tknottet
 
Posts: 24
Joined: 15 February 2015

Re: Puzzles Usage in a Presentation of a New Rating of Puzzl

Postby denis_berthier » Sat Feb 21, 2015 7:01 am

tknottet, yep, I had overlooked the size limitation in titles

I have one more question about the "high correlation" you mention in your first post between your rating and SER.
Could you give more detail about the sample you used to compute it.

When I did statistical analyses (years ago), I faced the following problems:
- in random samples, the easiest part is so large that it accounts for most of the correlation, whatever happens in the highest range
- in samples containing a large part of high SER ratings, SER is not linearly correlated to anything (the high SER values are somewhat flattened wrt any reasonable notion of complexity); some scaling function must be used (e.g. I had to take the log of my W, gW and similar ratings). And I'm not speaking only of extreme values, but of values higher than e.g. 8 or 9. You didn't say "linear correlation". So, did you apply any scaling of that kind?
denis_berthier
2010 Supporter
 
Posts: 3970
Joined: 19 June 2007
Location: Paris

Re: Puzzles Usage in a Presentation of a New Rating of Puzzl

Postby tknottet » Sat Feb 21, 2015 9:29 am

denis_berthier,

I didn't calculate correlation between the two rating methods, at all..
I said "Its value seems highly correlated to SE." only by the facts that the order of my ratings of SE Top 5 is same as SE.
I had to say just "linearly correlated". I'm very sorry for my wrong expression.
denis_berthier wrote:I have one more question about the "high correlation" you mention in your first post between your rating and SER.
Could you give more detail about the sample you used to compute it.
tknottet
 
Posts: 24
Joined: 15 February 2015

Re: Puzzles Usage in a Presentation of a New Rating of Puzzl

Postby denis_berthier » Sat Feb 21, 2015 12:08 pm

Hi tknottet"

thanks for all the precisions
denis_berthier
2010 Supporter
 
Posts: 3970
Joined: 19 June 2007
Location: Paris

Re: Some Puzzles Usage in my hardest sudoku list

Postby eleven » Sat Feb 21, 2015 11:27 pm

tknottet wrote:2_5) if P1(C) is solved then R(C) = R(P1(C))+1. [changed; R(P1(C))=0 by 2_2). +1 is count of until getting P1(C).]
2_6) if P1(C) displays a contradiction, R(C) = R(P1(C))+1 + R(C_S) + (1+R(P1(C_C))/2
C_S is the candidate in the same cell as C such that (directly or after applying rules recursively) determined as the big number of the cell
C_C is the other candidate in the same cell as C (in case of n=2 no C_C exists, in case of n=3 only one C_C exists)
[changed; my rating program calculates the statistical expectation under the condition that if a candidate(C) is selected as a starting candidate to be added as a value, the next candidate after finding contradiction is selected in the same cell as C. So try and error continues until C_S is selected and evaluated. The probability that C_C is selected before C_S is 1/2.]

tknottet,

when i tried to find out, what your rating could mean practically (from a manual solver's point of view), it turned out, that i don't understand the details.
If a candidate C leaves you with a puzzle P1(C), where you are stuck, you determine it's rating as R(P1(C)). But if C is a wrong candidate in the original puzzle, then P1(C) has no solution and all candidates are false - so for getting the ratings of it's candidates there is no "big number of the cell".
I also don't understand, why a contradiction candidate should add more to the rating, if there is another contradiction candidate in the same cell (..+ R(C_S) + (1+R(P1(C_C))/2).

Generally i can say, that this rating - which is an average rating for randomly choosing the cells to resolve, and the length of chains needed for a step does not count, is that different from hardest step ratings - which determine a single solution path with minimised chain lengths, that a good correlation is very unprobable.
But i would expect, that the ratings here would better reflect, how long a puzzle will take for a manual solver (in the ER 7-9 area). There are many ER puzzles with same rating, which extremely differ in solution times. One is solved with a single chain, for the other you need 10 of them.
eleven
 
Posts: 3094
Joined: 10 February 2008

Re: Puzzles Usage in a Presentation of a New Rating of Puzzl

Postby tknottet » Sun Feb 22, 2015 3:23 pm

eleven,

[1}details of my rating:
(I'm sorry for too many errors in my explanation, which confuse you.)

[1_1]R(P1(C)) in case that C is a wrong candidate in the original puzzle:
eleven wrote:when i tried to find out, what your rating could mean practically (from a manual solver's point of view), it turned out, that i don't understand the details.
If a candidate C leaves you with a puzzle P1(C), where you are stuck, you determine it's rating as R(P1(C)). But if C is a wrong candidate in the original puzzle, then P1(C) has no solution and all candidates are false - so for getting the ratings of it's candidates there is no "big number of the cell".

I changed the definition of R(P) without emphasis. In addition, The changed expression was in error.
(A)how many times the logic must be applied, after the candidate is assumed as the big number of the cell, until the puzzle is solved by try & error strategy (on Feb. 19)
(B) R(P) is statistical expectation of "how many times T is applied form first quiescence until determined solved/displaying a contradiction"(on Feb. 21)
In (B), "until determined solved/displaying a contradiction" should be "until determined solved/unsolvable". If C is a wrong candidate in the original puzzle, then R(P1(C)) is statistical expectation of 'how many times T is applied from[not form] first quiescence until determined unsolvable".
Also 2_6) should be corrected as follows:
2_6) if P1(C) displays a contradiction, R(C) is computed each case as follows.
-In case a candidate in the same cell as C is (directly or after applying rules recursively) determined as the big number of the cell (name the candidate as C_S):
R(C) = R(P1(C))+1 + R(C_S) + (1+R(P1(C_C))/2
where C_C is the other candidate in the same cell as C (in case of n=2 no C_C exists, in case of n=3 only one C_C exists)
-In the other case:
R(C) = sum(R(P1(Ci))+1)
whether Ci is every candidate in the same cell as C including C itself, then the summation is calculated over all candidates.

I found another error in 2_7). The correct expression is as follows.
2_7) otherwise, R(P1(C)) is computed by applying recursively the above procedure.
["define R(C) as R(P1(C)), where the latter is computed" was in error]
After the procedure, R(C) is calculated by the equation in 2_5) or 2_6) for each case.

[1_2}Definition of R(C)
eleven wrote:I also don't understand, why a contradiction candidate should add more to the rating, if there is another contradiction candidate in the same cell (..+ R(C_S) + (1+R(P1(C_C))/2).

As mentioned above, (B) defines R(P). I missed following R(C) definition.
R(C) is statistical expectation of "how many times T is applied from the candidate is assumed as the big number of the cell until determined whether the puzzle P1(not P1(C)) is solved or unsolvable".
By this definition, "the weighted average of the R ratings of the (yet undecided) candidates in P1" can be R(P) as described in 2_3).
Following this definition, R(Ci) (weighted by the probability to be selected) should be added into R(C), where Ci is every candidate in the same cell as C. The probability to be selected is as follows.
For C, probability is 1. Because the case C is selected first is considered.
For C_S, probability is 1 if C_S. Because T&E continues until C_S is selected or no candidate is remained.
(In case C_S exists)For the others, probability is 0.5. Because it is selected before/after C_S in equal probability.
(In case C_S does not exist)For the others, probability is 1. Because every candidate is selected one by one.

[2]Property of my rating method
Before my comment, I thank you very much. Because I don't know about details of other rating methods.It is my great pleasure to hear from you comparison with others.
eleven wrote:Generally i can say, that this rating - which is an average rating for randomly choosing the cells to resolve, and the length of chains needed for a step does not count, is that different from hardest step ratings - which determine a single solution path with minimised chain lengths, that a good correlation is very unprobable.
But i would expect, that the ratings here would better reflect, how long a puzzle will take for a manual solver (in the ER 7-9 area). There are many ER puzzles with same rating, which extremely differ in solution times. One is solved with a single chain, for the other you need 10 of them.

You kindly mentioned two characteristics.
(1)an average rating for randomly choosing the cells to resolve
It's my important preference.
(2)the length of chains needed for a step does not count
Yes I agree. My expectation is that in case of much hardest puzzle which needs many steps, the length of chains needed for a step is not so short.
As for correlation with other rating methods, linear correlation on much hardest puzzles(for example, best 10) is enough for me.

[3]Imam_bayildi
I confirmed that New_Imam_bayildi(I named) is a non minimal version of Imam_bayildi.
If Imam_bayildi's rating diffeers from New_Imam_bayildi's, there is/are unknown bug(s) in my solver program.
As for the 5th of ph_1409, I'll confirm soon.
tknottet
 
Posts: 24
Joined: 15 February 2015

Re: Puzzles Usage in a Presentation of a New Rating of Puzzl

Postby eleven » Sun Feb 22, 2015 8:35 pm

Thanks,

i did no want to split hairs.
It is still a bit unclear, if you talk about cell or candidate ratings, but i guess, i understand, what you are doing.

Can you tell us, how often you had to you apply your techniques (rules/T) to evaluate the highest ratings ? It must be a very big number, but i see no chance to estimate it.

The rating of the minimal Imam_bayildi can be higher, because you have more candidates in the starting grid. But in this case i think there should not be much difference.

btw. an alternative to trying 3-candidate cells would have been to use the strong links (bilocation candidates - one of them must be true) instead (the minimum number of strong links in a puzzle i found was 4).
In fact it seems to be a better tactic, because if one is shown to be true, also other candidates in the cell are eliminated.

I agree, that for very hard puzzles the length of the chains is not that important. It is the necessity of backtracking, which makes them so hard. However i suspect, that your rating could classify some puzzles as extremely hard, which are only very hard for SE. (But this would not make it worse of course.)
eleven
 
Posts: 3094
Joined: 10 February 2008

Re: Puzzles Usage in a Presentation of a New Rating of Puzzl

Postby tknottet » Mon Feb 23, 2015 4:50 pm

eleven wrote:It is still a bit unclear, if you talk about cell or candidate ratings, but i guess, i understand, what you are doing.

Currently I define:
R(C) is statistical expectation of "how many times T is applied from the candidate is assumed as the big number of the cell until determined whether the puzzle P1(not P1(C)) is solved or unsolvable".
This definition concerns "cost of cell testing after candidate selection", because I can determine whether the puzzle P1 is solved or unsolvable by testing one cell.
Although the definition seems not simple, I would like not to change the description of my rating method. The reason follows.
Only for the case that P1 is already unsolvable, the following definition maybe better:
R(CellC) is statistical expectation of "how many times T is applied from the cell is selected for testing until determined whether the puzzle P1(same as R(C) definition)) is solved or unsolvable".
R(CandidateC) is statistical expectation of "how many times T is applied from the candidate is assumed as the big number of the cell(same as R(C) definition) until determined whether the puzzle P1(CandidateC)(not P1) is solved or unsolvable".
Under this definition,
R(CellC)=sum(R(CandidateCi)) where CandidateCi is every candidate in CellC.
R(P) is defined as average of R(CellCj) where CellCj is every cell to be tested.
But, for the case that P1 is solvable, not only R(CellC) and R(CandidateC) but also R(C) is needed, because R(CellC) must be the average of R(Ci) where Ci is every candidate in CellC.
It seems to me that adding R(CellC) and R(CandidateC) makes the definition more complicated.
eleven wrote:Can you tell us, how often you had to you apply your techniques (rules/T) to evaluate the highest ratings ? It must be a very big number, but i see no chance to estimate it.

I'll modify my solving program to count it. Please wait for a few months.
To do so, would you show me an example of definition of "length of chain" (which is mentioned in your post on Feb 22)?
I'm interested in weights for various solving techniques.
eleven wrote:The rating of the minimal Imam_bayildi can be higher, because you have more candidates in the starting grid. But in this case i think there should not be much difference.

The puzzle obtained by applying T on the minimal Imam_bayildi is same as 23 clue version, in comparison of big numbers after some transposition and number permutation. So their ratings must be same.
And 8th (not 5th) puzzle in ph_1409 is the minimal Imam_bayildi. ED values seem to show that 5th and 7th are Kolk and Patience.
eleven wrote:btw. an alternative to trying 3-candidate cells would have been to use the strong links (bilocation candidates - one of them must be true) instead (the minimum number of strong links in a puzzle i found was 4).
In fact it seems to be a better tactic, because if one is shown to be true, also other candidates in the cell are eliminated.

It's interesting.
During development of the rating program, I tried a tactic where T&E is restricted to candidates with maximum number of strong links.
I will start to try it again after fulfilling some promises.
eleven wrote:I agree, that for very hard puzzles the length of the chains is not that important. It is the necessity of backtracking, which makes them so hard. However i suspect, that your rating could classify some puzzles as extremely hard, which are only very hard for SE. (But this would not make it worse of course.)

I cannot understand the role of my rating. It can be just `one of them".
But I found that linear correlation between mine and SE is not so good by comparing "11.9"s in ph_1409 and my list.
I would like to check the reason of difference.
Can I read the detail description of SE rating in this forum?
If anyone ask me to provide additional information (for example, "length of chain" which is above mentioned), I'll make efforts to do so.
tknottet
 
Posts: 24
Joined: 15 February 2015

Re: Puzzles Usage in a Presentation of a New Rating of Puzzl

Postby coloin » Mon Feb 23, 2015 8:08 pm

I can confirm that the 5th puzzle is a non-minimal version of the 4th
Code: Select all
+---+---+---+
|..3|..6|.8.|
|...|1..|2..|
|...|.7.|..4|
+---+---+---+
|..9|..8|.6.|
|.3.|.4.|..1|
|.7.|2..|...|
+---+---+---+
|3..|..5|...|
|..5|...|6..|
|98.|...|.5.|
+---+---+---+ 4th puzzle

In your list the 5th puzzle has a superfluos 1@r1c1 [would be the single 6@r2c9 in the above 4th puzzle]
Confusingly the non minimal version was the original imam bayildi coined by eleven it seems

The 4th puzzle is minimal - is elev;2;6;22; in the published list of 9 puzzles with SE 11.9

I don't think you should forget about the Easter Monster puzzle where the sk-loop was first found or the Fata_Morgana or the puzzles with high Suexratt
Code: Select all
1.......2.9.4...5...6...7...5.9.3.......7.......85..4.7.....6...3...9.8...2.....1#Easter Monster

Code: Select all
........3..1..56...9..4..7......9.5.7.......8.5.4.2....8..2..9...35..1..6........#Fata_Morgana

Code: Select all
..3......4...8..36..8...1...4..6..73...9..........2..5..4.7..686........7..6..5..#Suexrat9-10364

These may or may not score highly in your program - i am suspecting the suexrat9-10364 will

Hope this clarifies
C
coloin
 
Posts: 2379
Joined: 05 May 2005
Location: Devon

Re: Puzzles Usage in a Presentation of a New Rating of Puzzl

Postby eleven » Mon Feb 23, 2015 9:40 pm

tknottet,

I don't know, how SE exactly rates hard puzzles. I saw such a definition in this forum, but can't find it.
Champagne should know better, because he developed skfr, a faster SE-rating program.

In simple chains the length is probably the number of strong links (e.g. 3 for an xy-wing), in forcing chains their sums.
But i don't know, how it is done for nets (multipath chains). The rating must be the higher the more paths/links/cells are used.

If e.g. you try a number and get a contradiction applying your rules, this normally is equivalent to a more or less complex net (when numbers are forced by 2 or more earlier forced numbers).
These nets already have a high rating.

It is more surprise for me, when SE and your rating correlate, than when they differ. I agree with Coloin, that you probably will get a high rating for Suexrat9-10364 (suexrat counts the nodes needed to find a solution with dancing links, this process is repeated then, i think 10000 times).
But just because of the differences it would be nice to have both ratings for a puzzle, because they show different properties.


Thanks to you and Coloin for identifying the Imam Bayildi and its minmal:
Ah yes, i had found and published Imam Bayildi with 23 clues and rating 11.9/11.9./9.9. (The second and third rating stand for the difficulty of finding the first number and the first elimination.)
But the minimal version has the rating 11.9/1.2/1.2, because it starts with a single, which leads to the 23 clue puzzle.
Here of course the ratings must be the same.
Also thanks for identifying Kolk and Patience in champagne's list.

[Added:] PS: I have to opt out here for some time, i am too busy with other things.
eleven
 
Posts: 3094
Joined: 10 February 2008

Re: Puzzles Usage in a Presentation of a New Rating of Puzzl

Postby denis_berthier » Tue Feb 24, 2015 6:43 am

Hi tknottet,

I know the devil is in the details, but let me skip them for a while and concentrate on the potential meaning of your new rating, R.
I'll use the same notations as in my first post.

Starting from some puzzle P and applying repeatedly the rules in T until quiescence, one gets a puzzle (with candidates) P1 = T(P).
I forgot to mention it, but this is meaningful only if T has the confluence property (and I didn't check whether this is the case for the particular T you are using). Otherwise, depending in which order rules in T are applied, you may reach different puzzles P1.

Contrary to most of the known ratings, R is not intended as a rating of the hardest step in a resolution path.
Is it a rating of a full resolution path? Indeed, it can be considered neither as a rating of some resolution path with a particular property (e.g. simplest in some sense) nor as the mean rating of all the possible resolution paths (even modulo T). What is it then?

First, R(P) is defined as R(P1) (and similarly for each recursive step), which could be expressed as "the rating R is defined modulo T". But no candidate is ever eliminated for real from P1. So, what we get is:
R(P) is the mean rating, modulo T, of the first elimination step, wrt the strategy consisting of trying randomly each candidate at each level of T&E [restricted by your * condition (as notated in my first post)].

Do you agree with this definition?
denis_berthier
2010 Supporter
 
Posts: 3970
Joined: 19 June 2007
Location: Paris

Detail of the New Rating of Puzzl is Included

Postby tknottet » Tue Feb 24, 2015 7:21 pm

To coloin, thank you for confirmation about imam bayildi variation and recommendation of 3 puzzles.
coloin wrote:I can confirm that the 5th puzzle is a non-minimal version of the 4th
***tknottet omitted a code.***
In your list the 5th puzzle has a superfluos 1@r1c1 [would be the single 6@r2c9 in the above 4th puzzle]
Confusingly the non minimal version was the original imam bayildi coined by eleven it seems

The 4th puzzle is minimal - is elev;2;6;22; in the published list of 9 puzzles with SE 11.9

I don't think you should forget about the Easter Monster puzzle where the sk-loop was first found or the Fata_Morgana or the puzzles with high Suexratt
***tknottet omitted 3 codes.***
These may or may not score highly in your program - i am suspecting the suexrat9-10364 will

I'll remove the 5th puzzle(non minimal, I named New_Imam_bayildi) from my list.
Very bad news for me: the rating is 22.1194 for the 4th and 22.1706 for 5th.
As I wrote yesterday:
The puzzle obtained by applying T on the minimal Imam_bayildi is same as 23 clue version, in comparison of big numbers after some transposition and number permutation. So their ratings must be same.
I will check it in detail then report here later.
As for 3 puzzles which you recommended, I will rate them one by one.

To eleven(please read later), thank you for information.
eleven wrote:In simple chains the length is probably the number of strong links (e.g. 3 for an xy-wing), in forcing chains their sums.
But i don't know, how it is done for nets (multipath chains). The rating must be the higher the more paths/links/cells are used.

If e.g. you try a number and get a contradiction applying your rules, this normally is equivalent to a more or less complex net (when numbers are forced by 2 or more earlier forced numbers).
These nets already have a high rating.

It is more surprise for me, when SE and your rating correlate, than when they differ. I agree with Coloin, that you probably will get a high rating for Suexrat9-10364 (suexrat counts the nodes needed to find a solution with dancing links, this process is repeated then, i think 10000 times).
But just because of the differences it would be nice to have both ratings for a puzzle, because they show different properties.

I will try to make my rating to be somewhat useful for this forum 's activity.
As a part of it, I will propose the measurement(s) ( for example, summation of numbers of the paths/links/cells which are used when my rating tool solve the puzzle ) to be reported here.

To denis_berthier, thank you very much for your effort to clarify the detail of my rating.
denis_berthier wrote:I know the devil is in the details, but let me skip them for a while and concentrate on the potential meaning of your new rating, R.
I'll use the same notations as in my first post.

Starting from some puzzle P and applying repeatedly the rules in T until quiescence, one gets a puzzle (with candidates) P1 = T(P).
I forgot to mention it, but this is meaningful only if T has the confluence property (and I didn't check whether this is the case for the particular T you are using). Otherwise, depending in which order rules in T are applied, you may reach different puzzles P1.

Contrary to most of the known ratings, R is not intended as a rating of the hardest step in a resolution path.
Is it a rating of a full resolution path? Indeed, it can be considered neither as a rating of some resolution path with a particular property (e.g. simplest in some sense) nor as the mean rating of all the possible resolution paths (even modulo T). What is it then?

First, R(P) is defined as R(P1) (and similarly for each recursive step), which could be expressed as "the rating R is defined modulo T". But no candidate is ever eliminated for real from P1. So, what we get is:
R(P) is the mean rating, modulo T, of the first elimination step, wrt the strategy consisting of trying randomly each candidate at each level of T&E [restricted by your * condition (as notated in my first post)].

Do you agree with this definition?

To use it as supplement of the answers to your questions, I described the detail of my rating in below.
About the order of rules, it depends on P. But once Pis fixed, the order does not vary. So I have no question about uniqueness of P1. I described the order of rules in [3].
As for the relation of resolution paths, my intention is the statistical expectation over all possible paths. I described it in [6].
As the definition which you mentioned at the last line, I cannot understand why you limit to "the first" elimination. My intention is expectation(in other word: mean) of total number of times T is applied until the puzzle is solved. The description in [6] is intended to explain not only "all possible path" but also "until resolved".

detail of the rating

[1]Purpose
This rating Is intended to rate the mean difficulty of a puzzle when one follows a specific resolution model which is described in [4], choosing randomly when the model allows several possibilities.

[2]Notation and Conceptual Definition
(1)P:puzzle:Not only the target puzzle to be rated, but also each of intermediate puzzles is denoted as P.
To distinguish puzzles, variation of notation (for example P1) may be used.
(2)C:candidate:For efficiency reasons, only restricted candidates concern. The restriction rule is described in [5}.
To distinguish candidates, variation of notation (for example C_S or C_C) may be used.
(3)T:a specific set of rules:The application manner on P is described in [3].
(4)R:rating:R(P) and R(C) is defined as follows.
R(P) is defined as statistical expectation of "how many times T is applied from first quiescence until determined solved/unsolvable"
R(C) is statistical expectation of "how many times T is applied from the candidate is assumed as the big number of the cell until determined whether the puzzle P1 is solved/unsolvable".
R(C) is used in the situation where starting from some puzzle P and applying T until quiescence, one gets a puzzle P1, then C in P1 is assumed as the big number of the cell, then T is applied in recursive manner.

[3]Rule Applying Manner
In this rating, T consists 15 rules.
The rules are arranged in a sequence, from r1 to r15.
Each rule is applied in order.
Every time a rule is applied successfully, next application starts from r1.
When r15 ends in failure, it means quiescence.
For your reference, examples of rules I found English name are r1(Full House), r7(Locked Candidates), r8(Naked Pair/Triple/Quad) and r9(Hidden Pair/Triple/Quad).

[4]Resolution Model
Let P be a puzzle
1)apply repeatedly to P all the rules in T until quiescence, thus obtaining a puzzle P1 (with both decided values and candidates)
2)if P1 is solved, the procedure ends.
3)if P1 displays a contradiction, backtrack to 4-3).
4) otherwise,
4-1)randomly select a cell from cells with the smallest number of candidates in P1
4-2)randomly select a candidate (say C) in the cell
4-3)create a new puzzle P1(C) by adding C as a value and then put it into P and start recursively from 1)
4-4)when backtrack occurs in 3) or 4-5), try another candidate (say C) in the same cell then restart from 4-3)
4-5)if all candidate fails, backtrack to 4-3) of the previous recursive procedure

[5]Description of Rating Procedures
Let P be a puzzle
1)apply repeatedly to P all the rules in T until quiescence, thus obtaining a puzzle P1 (with both decided values and candidates)
2)if P1 is solved, define R(P) as 0.
3)if P1 displays a contradiction, define R(P) as 0.
4) otherwise, define R(P) as the weighted average of the R ratings of the (yet undecided) candidates in P1, where the weight for each candidate C in P1 is the inverse of the number of candidates in the same cell as C, (*** for efficiency reasons, this is limited to candidates C in cells with the smallest number of candidates in P1, say n);
5) for each candidate C in P1 [restricted by ***], create a new puzzle P1(C) by adding C as a value and then applying repeatedly all the rules in T until quiescence; define the rating R of C as follows:
6) if P1(C) is solved then R(C) = R(P1(C))+1.
7) if P1(C) displays a contradiction, R(C) is computed each case as follows.
-In case a candidate in the same cell as C is (directly or after applying rules recursively) determined as the big number of the cell (name the candidate as C_S):
R(C) = R(P1(C))+1 + R(C_S) + (1+R(P1(C_C))/2
where C_C is the other candidate in the same cell as C (in case of n=2 no C_C exists, in case of n=3 only one C_C exists)
-In the other case:
R(C) = sum(R(P1(Ci))+1)
whether Ci is every candidate in the same cell as C including C itself, then the summation is calculated over all candidates.
8) otherwise, R(P1(C)) is computed by applying recursively the above procedure.
After the procedure, R(C) is calculated by the equation in 2_5) or 2_6) for each case.

[6}A Simple Example
In case that
there is only one 2-candidate cell (name candidates C1 and C2) in P1
P1(C1) is 'otherwise', then there is only one 2-candidate cell (name candidates C3 and C4) in P1(C1)
(P1(C1) becomes new P1 for C3 and C4)
p1(C3) displays a contradiction
P1(C4) is solved
P1(C2) is 'otherwise', then there is only one 2-candidate cell (name candidates C5and C6) in P1(C2)
(P1(C2) becomes new P1 for C5 and C6)
p1(C5) displays a contradiction
P1(C6) displays a contradiction

Following 6 solving paths are possible
1)P>T>P1>selectC1>T>P1(C1)>selectC3>T>P1(C3)(Failure)>BT>C4remained>T>P1(C4)(Solved)
2)P>T>P1>selectC1>T>P1(C1)>selectC4>T>P1(C4)(Solved)
3)P>T>P1>selectC2>T>P1(C1)>selectC5>T>P1(C5)(Failure)>BT>C6remained>T>P1(C6)(Failure)>
BT>BT>C1remained>T>P1(C1)>selectC3>T>P1(C3)(Failure)>BT>C4remained>T>P1(C4)(Solved)
4)P>T>P1>selectC2>T>P1(C1)>selectC5>T>P1(C5)(Failure)>BT>C6remained>T>P1(C6)(Failure)>
BT>BT>C1remained>T>P1(C1)>selectC4>T>P1(C4)(Solved)
5)P>T>P1>selectC2>T>P1(C1)>selectC6>T>P1(C5)(Failure)>BT>selectC5>T>P1(C6)(Failure)>
BT>BT>C1remained>T>P1(C1)>selectC3>T>P1(C3)(Failure)>BT>selectC4>T>P1(C4)(Solved)
6)P>T>P1>selectC2>T>P1(C1)>selectC6>T>P1(C5)(Failure)>BT>selectC5>T>P1(C6)(Failure)>
BT>BT>C1remained>T>P1(C1)>selectC4>T>P1(C4)(Solved)
(BT denotes backtracking.)
This rating adopts the number of times T is applied as puzzle's difficulty. It is intended to calculate the expectation of that number over all possible paths.
Probability of each path is 1/2**NS, where NS is number of "selectCx".
So 1)1/4, 2)1/4, 3)1/8, 4)1/8, 5)1/8, 6)1/8
Number of T for each (T in P>T>P1 is excluded by definition)
1)3, 2)2, 3)6, 4)5, 5)6, 6)5
Expectation should be 3/4+2/4+6/8+5/8+6/8+5/8=4

According to rating procedure
R(P1(C4))=0 from [5]2)
R(P1(C3))=R(P1(C5))=R(P1(C6))=0 from [5]3)
R(C4)=1 from [5]6)
R(C3)=1+1=2 from [5]7)above case
So R(P1(C1))=(1+2)/2=3/2 from [5]4)weights(1/2for both) are omitted
Then R(C1)=5/2 from [5]6)
R(C5)=R(C6)=1+1=2 from [5]7)the other case
So R(P1(C2))=(2+2)/2=2 from [5]4)weights(1/2for both) are omitted
Then R(C2)=2+1+5/2=11/2 from [5]7)above case
Then R(P)=(R(C1)+R(C2))/2=(5/2+11/2)/2=4 from [5]4)weights(1/2for both) are omitted
tknottet
 
Posts: 24
Joined: 15 February 2015

Re: Detail of the New Rating of Puzzl is Included

Postby denis_berthier » Wed Feb 25, 2015 5:16 am

eleven wrote:In simple chains the length is probably the number of strong links (e.g. 3 for an xy-wing), in forcing chains their sums.
But i don't know, how it is done for nets (multipath chains). The rating must be the higher the more paths/links/cells are used.

Basically, for chains/nets of all types, SER counts the number of "nodes" in a T&E procedure at depth 1 or 2 (and then it applies some highly nonlinear and arbitrary scaling function).
A "node" in SER is an inference step, i.e. an assertion by (Naked or Hidden) Single or an elimination by ECP (direct contradiction).
In bivalue/bilocal chains (the basic AICs), this amounts to counting twice the number of "strong links".
But the idea of taking the number of "strong links" (i.e. the number of CSP variables or 2D-cells, in my approach) per se as the basis for the rating is totally alien to SER.


tknottet wrote:
denis_berthier wrote:Starting from some puzzle P and applying repeatedly the rules in T until quiescence, one gets a puzzle (with candidates) P1 = T(P).
I forgot to mention it, but this is meaningful only if T has the confluence property (and I didn't check whether this is the case for the particular T you are using). Otherwise, depending in which order rules in T are applied, you may reach different puzzles P1.

About the order of rules, it depends on P. But once Pis fixed, the order does not vary. So I have no question about uniqueness of P1. I described the order of rules in [3].
[...]
[3]Rule Applying Manner
In this rating, T consists 15 rules.
The rules are arranged in a sequence, from r1 to r15.
Each rule is applied in order.
Every time a rule is applied successfully, next application starts from r1.
When r15 ends in failure, it means quiescence.
For your reference, examples of rules I found English name are r1(Full House), r7(Locked Candidates), r8(Naked Pair/Triple/Quad) and r9(Hidden Pair/Triple/Quad).

Actually, non confluence problems are much subtler. They can't be solved only by ordering the rules, you'd also need to order rule applications (i.e. in practice the sets of candidates for which each rule is applied). Fortunately (in practice, but very unfortunately for debugging), they don't appear often. The above set of rules has the confluence property, but I don't know for your full set of rules with chains and "vectors"(?). It might be useful to explicit those other rules.
If you find that a puzzle and one of its isomorphs aren't assigned the same rating, it may come from non confluence problems (but there may also be other reasons inherent in your code). I wonder if it might explain the following case:
tknottet wrote:the 5th puzzle(non minimal, I named New_Imam_bayildi) from my list.
Very bad news for me: the rating is 22.1194 for the 4th and 22.1706 for 5th.
As I wrote yesterday:
The puzzle obtained by applying T on the minimal Imam_bayildi is same as 23 clue version, in comparison of big numbers after some transposition and number permutation. So their ratings must be same.

Could you try the following: keep the 23-clue version, apply the same "transposition and number permutation" as you apply to the minimal version and see what you get.


tknottet wrote:my intention is the statistical expectation over all possible paths. I described it in [6].
As the definition which you mentioned at the last line, I cannot understand why you limit to "the first" elimination. My intention is expectation(in other word: mean) of total number of times T is applied until the puzzle is solved. The description in [6] is intended to explain not only "all possible path" but also "until resolved".

OK, your rating is the mean DFS-depth over all the possible DFS(T)-resolution-paths-(restricted by *), where DFS is stopped in a path whenever a solution or a contradiction is found.

I understand that you want to provide users of your website with an indication of how many times they'll have to apply DFS if they use your set of rules T. But I still think that your rating would have a much broader audience if T was limited to Singles and direct contradictions (plus possibly "locked candidates"). It would allow interesting comparisons with other existing ratings.

[edit 2015/03/12 : changed T&E to DFS in order to avoid confusion with what T&E really is]
Last edited by denis_berthier on Thu Mar 12, 2015 5:23 am, edited 1 time in total.
denis_berthier
2010 Supporter
 
Posts: 3970
Joined: 19 June 2007
Location: Paris

Re: Detail of the New Rating of Puzzl is Included

Postby tknottet » Wed Feb 25, 2015 2:57 pm

denis_berthier wrote:
eleven wrote:In simple chains the length is probably the number of strong links (e.g. 3 for an xy-wing), in forcing chains their sums.
But i don't know, how it is done for nets (multipath chains). The rating must be the higher the more paths/links/cells are used.

Basically, for chains/nets of all types, SER counts the number of "nodes" in a T&E procedure at depth 1 or 2 (and then it applies some highly nonlinear and arbitrary scaling function).
A "node" in SER is an inference step, i.e. an assertion by (Naked or Hidden) Single or an elimination by ECP (direct contradiction).
In bivalue/bilocal chains (the basic AICs), this amounts to counting twice the number of "strong links".
But the idea of taking the number of "strong links" (i.e. the number of CSP variables or 2D-cells, in my approach) per se as the basis for the rating is totally alien to SER.

Thank you very much for information.
I'll modify my rating so that number of nodes/links is added into calculation. (the modification will take some months.)
denis_berthier wrote:Actually, non confluence problems are much subtler. They can't be solved only by ordering the rules, you'd also need to order rule applications (i.e. in practice the sets of candidates for which each rule is applied). Fortunately (in practice, but very unfortunately for debugging), they don't appear often. The above set of rules has the confluence property, but I don't know for your full set of rules with chains and "vectors"(?). It might be useful to explicit those other rules.
If you find that a puzzle and one of its isomorphs aren't assigned the same rating, it may come from non confluence problems (but there may also be other reasons inherent in your code). I wonder if it might explain the following case:
tknottet wrote:the 5th puzzle(non minimal, I named New_Imam_bayildi) from my list.
Very bad news for me: the rating is 22.1194 for the 4th and 22.1706 for 5th.
As I wrote yesterday:
The puzzle obtained by applying T on the minimal Imam_bayildi is same as 23 clue version, in comparison of big numbers after some transposition and number permutation. So their ratings must be same.

Could you try the following: keep the 23-clue version, apply the same "transposition and number permutation" as you apply to the minimal version and see what you get.

Thank you very much for your interesting proposal.
I illustrate the relation of the puzzles.

Pa(23-clue)>>>>>>>>(one clue elimination)>>>>>>>>Pb
V V
transposition and number permutation transposition and number permutation
V V
Pc>>>>>>>>>>>>>>>>(one clue eliminated)>>>>>>>>Pd(minimal version)
V
transposition and number permutation
V
(Pe) Pf

Did you propose to check Pf?
I misunderstood that you proposed to clarify which causes the difference of rating value, "clue elimination" or "transposition and number permutation". I already started the rating of Pc. It will take a few days. After that I will rate Pf.
In parallel, I'm checking the rating logs of Pa and Pd.
As for the confluence property of T, I will explicit the other rules later. Describing these rules in English is very hard for me.
denis_berthier wrote:OK, your rating is the mean T&E-depth over all the possible T&E(T)-resolution-paths-(restricted by *), where T&E is stopped in a path whenever a solution or a contradiction is found.

I understand that you want to provide users of your website with an indication of how many times they'll have to apply T&E if they use your set of rules T. But I still think that your rating would have a much broader audience if T was limited to Singles and direct contradictions (plus possibly "locked candidates"). It would allow interesting comparisons with other existing ratings.

Thank you very much for understanding and advice.
Rating by T with smaller set of rules is interested for me also, and not so difficult. I will be able to start rating within a month.
tknottet
 
Posts: 24
Joined: 15 February 2015

PreviousNext

Return to General