The New Sudoku Players' Forum

Website · by **denis_berthier** » Mon May 13, 2013 10:05 am

THE SUDOKU GREY ZONE

1) The T&E rough classification

I've already written this several times, but it is useful to remind it as a background for the sequel.
All the known (but very probably all the) 9x9 Sudoku puzzles can be solved with at most two levels of T&E (according to my definition of T&E). T&E is a very easy to program procedure and the following rough classification of a puzzle can be computed in a very fast way.

- T&E(0) is the set of puzzles that can be solved by Singles; it is devoid of any interest;
- T&E(1) is the set of puzzles that can be solved by braids and also most of the time by whips; their (unbiased) classification according to the W or B ratings is known in full detail;
- the remaining T&E(2) puzzles can be considered as the hardest ones - in a broad meaning of "hardest".

2) The grey zone

What I now want to introduce is the following:
- the T&E(1) puzzles (except maybe those with the highest B rating) can easily be generated by most of the generators;
- the T&E(2) puzzles are very hard to generate (using a kind of top-down generator, Blue reported that he got only 2 in about 70,000,000 - Blue, I can't find where this was exactly, I'm quite confident in my memory as for the order of magnitude, but correct me if you have a more precise estimate of frequency. Except when very specific patterns allow drastic simplifications, these puzzles are out of reach of even advanced players; we have no means of estimating how frequently such simplifications happen (and, of course, this depends on which set of specific patterns we consider - but here I'd say any pattern can be considered).

Strangely enough, very hard puzzles - satisfying conditions much more restrictive than being in T&E(2) and therefore still rarer - have been the topic of much research and have led to substantial lists of "hardest". They have also led to the discovery and study of exotic patterns. But no specific effort has been dedicated to the grey zone; as a result, we have little knowledge about it.

What I'll call the grey zone is the fuzzy border between T&E(1) and T&E(2) - let's say, in terms of SER, between 9.0 and 10.5 (but SER has its own limitations and these bounds are fuzzy also). I call a puzzle in the grey zone a grey puzzle. "Grey puzzle" remains a fuzzy concept.

In the two subsequent sections, I list a few questions relevant to this topic (these are not exhaustive).

3) Generation of puzzles in the grey zone

AFAIK, there's never been any specific attempt to collect or generate puzzles in this grey zone, let alone to generate collections with known statistical bias or, less ambitiously, as random as possible.
Probably many such puzzles have been found in the patterns game, but are they anywhere gathered as a fully fledged collection?
eleven, you said that you do no more Sudoku programming, but, just on a theoretical level, could your approach be used to produce grey puzzles? I can't see any a priori reason why not but I may miss some particulars.

4) Analysis of puzzles in the grey zone

An example of which kinds of analysis I mean is the study of the effect of exotic patterns on these puzzles:
- can one find sk-loops/JExocets/... in the grey zone;
- how frequently;
- are these exotic patterns somehow degenerated for puzzles in this zone;
- how much can the application of an sk-loop/JExocet/... modify the rating of a puzzle; can it make it solvable by a "normal" player (e.g. does it make its SER sufficiently small); I have tackled something similar with a few sk-loop examples in PBCS and shown how their B?B classification can change (more or less radically), but these were beyond the grey zone
- ...

I've already felt the lack of much knowledge about this grey zone as a hindrance in several occasions. One of the consequences is, as extreme puzzles are better known than grey ones, results about the frequency of some patterns tend to be strongly biased and probably largely over-estimated.

This topic may lead to nowhere, but it will at least be a marker for a series of open problems in case anyone is looking for something (hard) to gnaw.

by **champagne** » Mon May 13, 2013 1:08 pm

may be, to see what we are talking about, some statistics from the current pattern game 206)

In that game, I generated about 265 millions ED puzzles.
about 150K puzzles (0.56/000) passed the dynamic level in serate (in the range SER 9.3 to 11.0 in that game)
out of these 150K puzzles, less than 300 are eligible to the data base of potential hardest (which is already exceptional for a pattern)

From the data I see, I would say that the number of puzzles requiring the dynamic level (generally in the range 8.7 to 9.2-9.5) are at least 10 times the puzzles passing the dynamic level.

So we can expect a minimum ratio 1 to 10.000 between the "potential hardest" and the so-called grey area;

We have currently close to 900k puzzles in the "potential hardest" file, anybody willing to investigate the grey area must be prepared to manage files of tens billions of puzzles.

I throw daily in the bin millions of such puzzles in my search for "potential hardest"

On the positive side, having a precise goal, it would be easy to regenerate some puzzles in the grey area starting from the file of "potential hardest" or from the seeds already processed.

Website · by **denis_berthier** » Mon May 13, 2013 1:36 pm

champagne wrote:having a precise goal, it would be easy to regenerate some puzzles in the grey area starting from the file of "potential hardest" or from the seeds already processed.

Starting with "potential hardest" would lead to very biased results.
But can your generator start from a set of random seeds and produce puzzles in the 9.0 - 10.5 area without any other filter?

by **champagne** » Mon May 13, 2013 1:59 pm

denis_berthier wrote:
champagne wrote:having a precise goal, it would be easy to regenerate some puzzles in the grey area starting from the file of "potential hardest" or from the seeds already processed.

Starting with "potential hardest" would lead to very biased results.
But can your generator start from a set of random seeds and produce puzzles in the 9.0 - 10.5 area without any other filter?

I have no code for random generation, but I am sure others have some.
I have in fact 3 kinds of generation mode

vicinity (the main one)
full scan of a pattern
symmetry of given

Unbiased generation would surely require more power (and more time) to produce puzzles in the expected area.
In the pattern game, the full scan has a poor productivity and I used that process only to create new seeds.
Most often, I play only through vicinity after a generation with "symmetry of given".

In the search for "potential hardest", the 2 main generation modes on my side are +-3 within a pattern and +-1 outside the pattern, but mladen experienced other modes with success.

Website · by **denis_berthier** » Mon May 13, 2013 2:56 pm

champagne wrote:Unbiased generation would surely require more power (and more time) to produce puzzles in the expected area.

I think unbiased generation is totally unreachable for grey puzzles - it is already unreachable in T&E(1).
Even controlled-bias generation is unreachable for grey puzzles.

champagne wrote:I have no code for random generation,

I was not thinking of fully random generation.
But, starting from a (relatively) random set of seeds (which I could provide), could proximity search (guided by the distance to an SER value in the "grey interval") lead to grey puzzles in a reasonable number of steps?

by **champagne** » Mon May 13, 2013 3:59 pm

denis_berthier wrote:I was not thinking of fully random generation.
But, starting from a (relatively) random set of seeds (which I could provide), could proximity search (guided by the distance to an SER value in the "grey interval") lead to grey puzzles in a reasonable number of steps?

The process is quite trivial and gives quick results: just generate , rate and select puzzles in the searched area.

The main problem in your case would be to filter the final output to keep only ED puzzles. In the game 206, my data base is compacted, but in total, it is already a 6GB file (for one pattern all puzzles).
In a multi pattern search, the data base would be at least 5 times bigger.

The second problem is to keep track of seeds used to avoid double work.

Website · by **denis_berthier** » Mon May 13, 2013 5:39 pm

champagne wrote:
denis_berthier wrote:I was not thinking of fully random generation. But, starting from a (relatively) random set of seeds (which I could provide), could proximity search (guided by the distance to an SER value in the "grey interval") lead to grey puzzles in a reasonable number of steps?

The process is quite trivial and gives quick results: just generate , rate and select puzzles in the searched area.

But how do you "just generate"? Top-down or bottom-up generators almost never produce puzzles with SER > 9.3.

champagne wrote:The main problem in your case would be to filter the final output to keep only ED puzzles.
[...] In a multi pattern search...

I don't know what you call an ED puzzle, but the idea is not to apply any filter (except getting closer to the selected SER value) or to use any pattern(s).

I was thinking of a process similar to eleven's tamagotchi, but with the target defined by a fixed SER value instead of the highest one.

by **champagne** » Mon May 13, 2013 7:02 pm

denis_berthier wrote:
champagne wrote:
denis_berthier wrote:I was not thinking of fully random generation. But, starting from a (relatively) random set of seeds (which I could provide), could proximity search (guided by the distance to an SER value in the "grey interval") lead to grey puzzles in a reasonable number of steps?

The process is quite trivial and gives quick results: just generate , rate and select puzzles in the searched area.

But how do you "just generate"? Top-down or bottom-up generators almost never produce puzzles with SER > 9.3.

It has been proven for long that the most efficient process is a vicinity generation.
Depending on what you intend to do, the vicinity can be

within the pattern +-n changes what I call in short +-n "in"
within the same number of clues +-n changes outside the pattern what I call in short +-n "out"
with a change in the number of clues -n+p

This gives if you have more than 23 clues a huge number of new puzzles, but the process can be highly time consuming if you pass "+-3 in" or "+-1 out".

several variations around that basic idea are possible.

denis_berthier wrote:
champagne wrote:The main problem in your case would be to filter the final output to keep only ED puzzles.
[...] In a multi pattern search...

I don't know what you call an ED puzzle, but the idea is not to apply any filter (except getting closer to the selected SER value) or to use any pattern(s).

ED puzzles is the agreed short in that forum for "essentially different puzzles", clearing all morphs of seen puzzles.

denis_berthier wrote:I was thinking of a process similar to eleven's tamagotchi, but with the target defined by a fixed SER value instead of the highest one.

I know nothing about that tamagotchi process, but I am sure that eleven did more or less the same as me. The differences are mainly in the filters used to select puzzles fitting with the target.

Website · by **denis_berthier** » Tue May 14, 2013 3:23 am

champagne wrote:It has been proven for long that the most efficient process is a vicinity generation.

It depends on one's goal. Proximity search cannot produce unbiased or controlled-bias stats. But here having such goals would be unrealistic. We can only try to avoid systematic bias due to the initial set of puzzles or to predefined patterns.

champagne wrote:within the pattern +-n changes what I call in short +-n "in"
within the same number of clues +-n changes outside the pattern what I call in short +-n "out"
with a change in the number of clues -n+p

As there's no pattern, it can only be -n+p, which is essentially what I meant from the start by proximity search. However, details of this process should be fixed before starting any computations.

champagne wrote:
denis_berthier wrote:I was thinking of a process similar to eleven's tamagotchi, but with the target defined by a fixed SER value instead of the highest one

I know nothing about that tamagotchi process, but I am sure that eleven did more or less the same as me. The differences are mainly in the filters used to select puzzles fitting with the target.

The only filter here should be "closer to the SER target".

Independent computations would have to be done for each SER value in the grey zone, starting from the same set P0 of random puzzles.
Maybe the computation time should be fixed, the same for all the SER values, in order to get a vague idea of how easy it is to reach each SER by this method.

by **champagne** » Tue May 14, 2013 7:22 am

denis_berthier wrote:The only filter here should be "closer to the SER target".

Independent computations would have to be done for each SER value in the grey zone, starting from the same set P0 of random puzzles.
Maybe the computation time should be fixed, the same for all the SER values, in order to get a vague idea of how easy it is to reach each SER by this method.

If I get it, you are considering a kind of "monte carlo" approach.
I can understand that if the goal is to give an unbiased ratio of appearance of a certain pattern.

In that field I have no experience to share.

I had in view to build a data base of puzzles in the target and to study them in a second run.

by **m_b_metcalf** » Tue May 14, 2013 8:12 am

denis_berthier wrote:Maybe the computation time should be fixed, the same for all the SER values, in order to get a vague idea of how easy it is to reach each SER by this method.

Well, we certainly have more than vague ideas already, by any approach. There is a cliff at 9.4 and another at 10.4, for instance.

Regards,

Mike Metalf

P.S. Why don't the two of you exchange telephone numbers by PM, sort out a strategy by phone, and tell us what you conclude!

Website · by **denis_berthier** » Tue May 14, 2013 9:54 am

champagne wrote:If I get it, you are considering a kind of "monte carlo" approach.

yes

champagne wrote:if the goal is to give an unbiased ratio of appearance of a certain pattern.

it is. Not really unbiased because this is unreachable, but as unbiased as possible.

Website · by **denis_berthier** » Tue May 14, 2013 9:59 am

m_b_metcalf wrote:There is a cliff at 9.4 and another at 10.4

9.4 is probably related to the T&E(1) / T&E(2) boundary
10.4 may be related to the gT&E(1) / gT&E(2) boundary
But, considering the way SER is computed, this can't be an exact correspondence.

I've also noted a discontinuity between 10.8 and 10.9 (mainly due to arbitrary choices in SE).

by **eleven** » Wed May 22, 2013 9:36 am

denis_berthier wrote:
blue wrote:
champagne wrote:I started the generation of puzzles with no filter using the data base of potential hardest. It could be interesting to give the criteria for our "grey zone";
[...]
Denis may have some ideas.
...
I also wrote that it is currently impossible to generate unbiased collections of puzzles in this zone - or even collections with known bias (as with the controlled-bias generator).
Any collection assembled from the Patterns Game or starting from the "potential hardest" would probably be extremely biased.
I think the best approach would be eleven's tamagotchi, but nobody (including me) seems to be interested in investing much (computer and brain) time in generating puzzles in this zone.

No, the tamagochi method would produce the same bias, because basically it uses the same method of neighbourhood search with filtering "easy" puzzles (it was just useful, that the criteria for expanding/filtering or adding new sets could be dynamically changed during the generation process).
The bias then comes from the fact, that puzzles in big "clouds" of hard puzzles with similar properties (!) will be over-represented, while others with a small number of toughies in their neigbourhood are not found. So its not the best choice to start with the hardest set to generate grey zone puzzles, if you want to know the commonness of Exocet patterns (because we know they are the more common, the harder the puzzles are).
Probably better would be a bottom-up generation with a limited number of neighbourhood extensions, but then the effort to get a set of the same size would be a multiple higher.

by **champagne** » Wed May 22, 2013 10:02 am

eleven wrote:No, the tamagochi method would produce the same bias, because basically it uses the same method of neighbourhood search with filtering "easy" puzzles (it was just useful, that the criteria for expanding/filtering or adding new sets could be dynamically changed during the generation process).
The bias then comes from the fact, that puzzles in big "clouds" of hard puzzles with similar properties (!) will be over-represented, while others with a small number of toughies in their neigbourhood are not found. So its not the best choice to start with the hardest set to generate grey zone puzzles, if you want to know the commonness of Exocet patterns (because we know they are the more common, the harder the puzzles are).
Probably better would be a bottom-up generation with a limited number of neighbourhood extensions, but then the effort to get a set of the same size would be a multiple higher.

it is somehow strange to read at the same momenb that some would know more about the grey zone but are not prepared to spend a penny (nor one cycle) to do it.

I fully agree with eleven comments on the biased results coming out of the potential hardest file, but that file has a wide variety of patterns and can give quickly some results.

It is surely enough to learn more about the evolution of different types of exocets even if it does not answer to the question of the overall frequency.

The New Sudoku Players' Forum

The Sudoku grey zone

The Sudoku grey zone

Re: Pattern-based classification of (hard) puzzles

Re: Pattern-based classification of (hard) puzzles

Re: Pattern-based classification of (hard) puzzles

Re: Pattern-based classification of (hard) puzzles

Re: Pattern-based classification of (hard) puzzles

Re: Pattern-based classification of (hard) puzzles

Re: Pattern-based classification of (hard) puzzles

Re: Pattern-based classification of (hard) puzzles

Re: Pattern-based classification of (hard) puzzles

Re: Pattern-based classification of (hard) puzzles

Re: Pattern-based classification of (hard) puzzles

Re: Pattern-based classification of (hard) puzzles

Re: The Sudoku grey zone

Re: The Sudoku grey zone