@denis_berthier I know that beyond a certain difficulty it's necessary to use candidate grids to solve problems. I decided when I started that I (as a human) was not going to do that, or give the net access to the same information. It's just a constraint I put on what I was allowed to do. (I did let myself write down some limited information, but not the full candidate grid.) One reasons for this was that what was interesting from a cognitive perspective was what techniques a human would come up with to work around the 'limited memory'. That's often a key factor shaping cognitive problems of all kinds. I'm sure the problems I solved are very easy indeed by your standards (the hardest puzzle in the 'Original Sudoku' book is a SE 3.4), but some of them were hard for me when I was coming to it without knowing any Sudoku techniques or, indeed, ever having solved a Sudoku problem before!
Incidentally, this is why I asked @eleven whether there was a SE cutoff above which you
needed candidate grids to progress.
How do you choose the next step?
Whatever I spot? I don't follow a predetermined algorithm, if that's what you mean. As I worked I built up a collection of heuristics for where to look next. E.g. early on, look for pairs of rows (in the same group of 3 rows) with the same number in. Let's say you spot a pair of 7s; that tells you that another 7 must be in one of three squares. Or another heuristic is, if you spot a 3x3 block with this...
xy.
z..
...
Then it's worth looking in the row and col that have '...' in; if any number that is not one of xyz occurs in both row and column, you know where it must go.
But even this is not enough to eliminate inconsistent choices; so you must somehow guess what's right or wrong.
I don't guess as a human; I only fill in a number when I can give you a proof of why it must be there. The net does guess, in that it doesn't generate a proof. There is a specific technical sense in which I penalise the net for guessing, though. One of the things I was interested in was the degree to which the reasoning and the net's approach converged.
BTW, it would indeed have been more interesting to allow the net to store data between stages -- but I would not want to feed in a candidate grid; instead I'd want it to be able to mark arbitrary 'annotations' on the grid, and have those fed from stage to stage. Seeing what information the net chose to pass would have been fascinating. Such an approach would have needed reinforcement learning, though, and I didn't have the computational power for that. So I went with the more tractable problem.
---
I've thought of an analogy which might help. The way a human does algebra is completely unlike the way a computer algebra system does algebra -- the human just spots things and tries whatever seems best next, whereas the CAS system uses a predetermined procedure. When you talk about Sudoku, you sound like you're using an approach more like the CAS system. But that's not what I'm interested in studying -- I'm interested in the more naive, untrained, fumbling human approach, because it tells us more about cognition.