The New Sudoku Players' Forum

by **ghfick** » Wed Feb 22, 2023 4:05 am

I have no interest in the very elementary statistics you use in Sudoku.

www.mathgenealogy.org/id.php?id=16002

You are clearly an adult bully. Many people on the forum see you that way. Some bullies never grow up.

Website · by **denis_berthier** » Wed Feb 22, 2023 4:26 am

.
As your previous posts: no rational argument.

by **ghfick** » Wed Feb 22, 2023 5:00 am

denis_berthier wrote:.
I do have: see chapter 6 of [PBCS].

I now have no interest in ANY of your self-published non-peer-reviewed writing. I will not be reading PBCS or HLOS or any of the brutal impenetrable posts about SudoRules.
You have now thoroughly alienated me from pursuing any of your work. I am joining a very long list. I guess that you really do not care if most the Sudoku world abandons you.

You repeatedly claim you are such a rational thinking person. You claim you are worthy of careful rational argument. Nonsense.

You have been silent about peer review. I suspect you would have a terrible time receiving a critical peer review of any of your so-called academic work.

I will be reading the posts of mith, shye, eleven and others concerning their very interesting TH work. My view is that detailed knowledge of T&E(whatever) is not necessary. My decision to skip any of the T&E(blah) will not impede my appreciation of the outstanding work of mith, shye, eleven and others.

Website · by **denis_berthier** » Wed Feb 22, 2023 5:22 am

.
My academic and industrial career is not your business.
You read what you like. Indeed, you're totally right: I don't care. Your decisions are based on emotional reactions. You are not the kind of reader I'm interested in.
Via my colleagues, via my publisher, via GitHub and via ResearchGate, I have enough information about how much my publications are read or used.

by **ghfick** » Wed Feb 22, 2023 5:56 am

You cannot have any idea of the reasons behind my decisions. How pompous of you to think that you can.

My academic career is none of your business too.

However. I am a scientist. I am a mathematician. I am a biostatistician. I have contributed to the health research literature. I have recently published in 'Statistics In Medicine'. Statistics In Medicine is one of the key peer-reviewed journals in my field.

Sudoku is one of my hobbies. I try to contribute to Sudoku as best I can. I still enjoy the challenge of actual human solving. I have quite recently become interested in the 9x9 Calcudoku [aka KenKen]. The study of Calcudoku seems to be just getting started. Contributors have some ideas about difficulty and other matters similar to Sudoku but there appears to be much to be done. I am mentioning this matter because I know there are many people reading this thread.

I now regret wasting so much of my time today. I can hope that others will read some the material I discussed here and will get in touch with me.

by **eleven** » Wed Feb 22, 2023 10:53 am

denis_berthier wrote:My chains prove the eliminations by (trivial) pure logic arguments. They require no program to do it.
The logical formulæ expressing these rules don't have any variables for the z- and t- candidates.
CSP-Rules, the program implementing my chains - which is not a program at all in the classical sense, but a mere re-writing of logical formulæ in a syntax close to FOL - doesn't provide for any possibility for the chains to "remember" their z- and t- candidates. CSP-Rules is public - you can check this.

The user/reader doesn't have to remember the z- and t- candidates. At any point of resolution, the remaining candidates are on the grid. What the user has to do to check a chain step by step is to note what the next CSP-variable, the next llc and the next rlc are and to check that the other remaining candidates for this CSP-Variable are linked to previous rlcs. The only thing the user has to remember is the chain itself (i.e. the sequence of csps, llcs and rlcs).

You are right, there is no need to remember t- and z-candidates - if you remember the whole candidates grid after each node (i.e. if a cell is set to a number, remove it from the all the candidates in the 3 units).
That's what a manual solver never wants to do, if she is not sure, that the number is true. Instead (without the help of a simple solver, which does it for you), a better way to verify the chain is to go from node to node with the original grid, and - if necessary - look back, which of the former nodes verify it. This way you only have to look up the necessary eliminations and not the whole bunch of former nodes.
My criticism therefore was, that the chain does not help in this process by marking for each node, which former nodes were needed for it, or - in other words - which (apart from the last) you have to remember for it.
I see, that for you this is absolutely irrelevant, it never was your goal to write user friendly chains. The consequence for me is, that studying these chains is a long way around, if someone wants to become a better manual solver.

Website · by **denis_berthier** » Wed Feb 22, 2023 11:18 am

eleven wrote:
denis_berthier wrote:My chains prove the eliminations by (trivial) pure logic arguments. They require no program to do it.
The logical formulæ expressing these rules don't have any variables for the z- and t- candidates.
CSP-Rules, the program implementing my chains - which is not a program at all in the classical sense, but a mere re-writing of logical formulæ in a syntax close to FOL - doesn't provide for any possibility for the chains to "remember" their z- and t- candidates. CSP-Rules is public - you can check this.

The user/reader doesn't have to remember the z- and t- candidates. At any point of resolution, the remaining candidates are on the grid. What the user has to do to check a chain step by step is to note what the next CSP-variable, the next llc and the next rlc are and to check that the other remaining candidates for this CSP-Variable are linked to previous rlcs. The only thing the user has to remember is the chain itself (i.e. the sequence of csps, llcs and rlcs).

You are right, there is no need to remember t- and z-candidates - if you remember the whole candidates grid after each node (i.e. if a cell is set to a number, remove it from the all the candidates in the 3 units).
That's what a manual solver never wants to do, if she is not sure, that the number is true. Instead (without the help of a simple solver, which does it for you), a better way to verify the chain is to go from node to node with the original grid, and - if necessary - look back, which of the former nodes verify it. This way you only have to look up the necessary eliminations and not the whole bunch of former nodes.
My criticism therefore was, that the chain does not help in this process by marking for each node, which former nodes were needed for it, or - in other words - which (apart from the last) you have to remember for it.
I see, that for you this is absolutely irrelevant, it never was your goal to write user friendly chains. The consequence for me is, that studying these chains is a long way around, if someone wants to become a better manual solver.

The only resolution state you have to consider when you look for or check a chain is the current resolution state, which is in front of you, be it on your paper grid or your screen. It doesn't change as you observe the chain from left to right. It even doesn't change at all once you have found the chain. It only changes when you apply the chain and eliminate its target.

by **eleven** » Wed Feb 22, 2023 1:51 pm

Obviously you have no interest in manual solving or verifying chains.
Turn it as you want, generally you can't have a node without (part of) the former nodes - opposite to sequential implications or AIC.
The logic behind your chains is (forward chaining)[edit: this makes it clearer]
a=>b, a&b=>c, a&b&c=>d,...,a&b&c&...&w =>x
while the others use
a=>b, b=>c,c=>d,...,w=>x

by **ghfick** » Wed Feb 22, 2023 2:24 pm

Thank you so much, eleven, for providing clarity around a mystery. In principle, the memory required adds to a human solver's challenge. The notation, while providing symmetry with respect to the four views, does not provide enough information [ per se ] at any given step. A related project is being advanced by Philip Beeby. He is currently suggesting and illustrating a notation that does record the previous information as it is needed. For many Sudoku solvers, it has the advantage of being an extension to the Eureka notation.

Philip calls these chains "Complex Chains" and they are discussed in Part C of: www.philsfolly.net.au/chain_help.htm

eleven, you may be interested in Philip's steps to implement the "Replacement" technique. He cites you, mith and henrik_monard in:

www.philsfolly.net.au/xyz_help.htm

Philip will acknowledge that his additions are 'works-in-progress'. Perhaps, eleven, you already know of Philip's site and his current work.

by **marek stefanik** » Wed Feb 22, 2023 2:31 pm

I'll also throw my two cents worth in here.

Firstly, I agree with eleven's criticism of the nrczt-notation.
I will try to explain it in a form even Denis can understand, with n denoting the number of strong links and k denoting the number of weak links.
I will only focus on basic chains/nets i.e. those which do not contain other patterns or relations created using other patterns.

Unnecessarily detailed explanation of the complexity of verification of AIC-based nets: Show

You can therefore check each net in O(n+k) time and O(1) space.

Now, how is one supposed to read a chain/net in nrczt-notation?
You can see the rlc and the target in the notation, i.e. O(1) space (if you were to look for one, you would have to remember all of them, but that's not the point here).
With each variable you encounter, you have to check that all candidates but the rlc link to previous rlcs or the target. For whips and simpler chains/nets, you know that the llc links to the previous rlc (or the target if it's the first variable), however, for braids, you have to (in the worst case) check each previous rlc and the target for each z- or t-candidate and the llc, ie. in the worst case you need to check all previous rlcs and the target for all llcs and z- and t- candidates to find the weak links, so the whole validation is done in O(nk) time.

Conclusion: the nrctz-notation is shorter and easier to write, but poses an unnecessary burden on the reader.

It is of little or no surprise that its author is blissfully unaware of (if not completely indifferent to) other people's feelings.

denis_berthier wrote:You [Gordon] claim to have a PhD in stats. Do you have any statistical results about Sudoku? No.
I do have

Note that these statistics are a bit misleading (see Denis' publication from 2009).

What the methods used do is that they allow the creation of unbiased statistics over the generator's pool, i.e. the set of puzzles appearing in their grid's minlex form.
For many people, this is a very unusual set of puzzles.
While the metrics considered in the paper would probably not change much, one could modify the calculations to produce unbiased statistics over more standard sets of puzzles.

With N denoting the number of ways a puzzle could be morphed, each ED puzzle has N/no_of_puzzle_autorphisms morphs in total equally spread among N/no_of_grid_automorphisms grids, meaning that no_of_grid_automorphisms/no_of_puzzle_autorphisms of them are present in the generator's pool (with no_of_puzzle_autorphisms and no_of_grid_automorphisms standing for the number of the puzzle's and its solution grid's automorphisms, respectively, including identity).

Dividing each puzzle's weight by the number of its grid's automorphisms would produce unbiased statistics over the set of all puzzles (for morph-dependent metrics one would also have to morph the puzzles randomly), if we also multiply the weight by the number of the puzzle's automorphisms we get unbiased statistics over the set of all ED puzzles. Denis does none of that, leaving the reader with no clue whether these statistics would actually differ and by how much.

He then even proceeds to estimate the number of all puzzles and the number of all ED puzzles from the same figure, without even mentioning that this only comes close to the correct values because automorphisms are incredibly rare (in fact I don't recall seeing the word 'automorphism' once throughout the entire paper).

There was a time where he would use his statistics to show how a technique (JExocet) which can be used in many hard puzzles (not in the most recent ones with TH) is basically completely useless as it simplifies the resolution of no puzzle from his controlled* bias collection without mentioning how many of those puzzles are solved with techniques simpler that JE, so it wouldn't help even if it were present. I am glad to see that he seems to have changed his ways at least in this one occasion, though the progress is slow.

Marek

Website · by **denis_berthier** » Wed Feb 22, 2023 3:12 pm

eleven wrote:Obviously you have no interest in manual solving or verifying chains.
Turn it as you want, generally you can't have a node without (part of) the former nodes - opposite to sequential implications or AIC.
The logic behind your chains is (forward chaining)[edit: this makes it clearer]
a=>b, a&b=>c, a&b&c=>d,...,a&b&c&...&w =>x
while the others use
a=>b, b=>c,c=>d,...,w=>x

t-whips and whips built on the previous right-linking candidates - there's no mystery here. Bivalue-chains (equivalent to basic AIC's) don't. As a result, complexity and resolution power are greater. There's no mystery.
It is never necessary to remember the z- and t-candidates.

Website · by **denis_berthier** » Wed Feb 22, 2023 3:15 pm

marek, your point about automorphisms is correct and has been discussed with gsf at the time I worked on this. Automorphisms are so rare that they play not role in the stats. So, nothing new on the horizon.

by **ghfick** » Wed Feb 22, 2023 3:17 pm

marek stefanik wrote:Note that these statistics are a bit misleading

In Epidemiology and Biostatistics litarature, bias is typically the biggest issue. The terms: 'confounding' and 'modification' have received massive research attention.
The non-peer-reviewed paper you mention uses the most elementary of statistical methods. The real statistical literature contains so much more. In addition, one needs real clarity of attention to the population. It is often very difficult to identify the actual population to which inferences are being offered. Your points are so important. Real unbiasedness is a very slippery and illusive matter.

You note the insulting "claim to have a PhD in stats". Dennis The Menace [DTM], indeed. I cited the genealogy page earlier. Curiously, DTM is not listed there. Of course, his doctorate [if it exists] would not be in Mathematics. He does, oddly, claim to be a Mathematician, at various places. Most of his writing would be rejected by the real Mathematical and Statistical literature referees. DTM has no peer-reviewed work.

Website · by **denis_berthier** » Wed Feb 22, 2023 3:25 pm

.
Trying to submerge me with pages of rantings will lead nowhere. I have serious occupations. If any of you has any real and precise point to make, you should be able to state it in a few sentences.

Criticising the nrc-notation is absurd. It's the best notation we currently have:
- it is universal (meaningful for any CSP)
- it considers rc-bivalue and bilocal as the same thing (bivalue), in opposition to AIC notation that is totally inconsistent.
- it puts the stress on the backbone of the chain instead of on insipid details (z- and t- candidates).

My paper on stats does indeed use the most elementary tools of stats. I've never claimed otherwise. No need to try to mislead people with questions of "populations". The population is perfectly defined as the full set of Sudokus. My sample was created by the controlled-bias generator from the full set of complete grids (provided by gsf).

by **ghfick** » Wed Feb 22, 2023 3:38 pm

It is now even clearer to me that DTM does not understand the distinction between a population and a sample drawn from a population. Unbiasedness is a property of the process one uses. There is so much more here. Advanced textbooks on finite populations and surveys give the real details.

The New Sudoku Players' Forum

Figure 1.3 from "The Logic Of Sudoku" by Andrew Stuart

Re: Figure 1.3 from "The Logic Of Sudoku" by Andrew Stuart

Re: Figure 1.3 from "The Logic Of Sudoku" by Andrew Stuart

Re: Figure 1.3 from "The Logic Of Sudoku" by Andrew Stuart

Re: Figure 1.3 from "The Logic Of Sudoku" by Andrew Stuart

Re: Figure 1.3 from "The Logic Of Sudoku" by Andrew Stuart

Re: Figure 1.3 from "The Logic Of Sudoku" by Andrew Stuart

Re: Figure 1.3 from "The Logic Of Sudoku" by Andrew Stuart

Re: Figure 1.3 from "The Logic Of Sudoku" by Andrew Stuart

Re: Figure 1.3 from "The Logic Of Sudoku" by Andrew Stuart

Re: Figure 1.3 from "The Logic Of Sudoku" by Andrew Stuart

Re: Figure 1.3 from "The Logic Of Sudoku" by Andrew Stuart

Re: Figure 1.3 from "The Logic Of Sudoku" by Andrew Stuart

Re: Figure 1.3 from "The Logic Of Sudoku" by Andrew Stuart

Re: Figure 1.3 from "The Logic Of Sudoku" by Andrew Stuart

Re: Figure 1.3 from "The Logic Of Sudoku" by Andrew Stuart