The New Sudoku Players' Forum

by **David P Bird** » Thu Jan 03, 2013 8:03 pm

This is new territory for me, but I guess it’s old hat for many here. Is there an easy to use utility that will strip out redundant clues from a non-minimal puzzle in our armoury? If so, some basic instructions or pointers to relevant posts would be appreciated.

I presume that often there will be more than one way of pruning the clue set perhaps even ending with different numbers of clues. I'd be grateful for any sage words on that subject too.

by **enxio27** » Thu Jan 03, 2013 8:21 pm

Ruud's Sudocue has an Optimizer (in the Extras menu) that I think will do what you want. There are several options you can choose from based on the difficulty you want to end up with.

by **champagne** » Thu Jan 03, 2013 8:39 pm

David P Bird wrote:This is new territory for me, but I guess it’s old hat for many here. Is there an easy to use utility that will strip out redundant clues from a non-minimal puzzle in our armoury? If so, some basic instructions or pointers to relevant posts would be appreciated.

I presume that often there will be more than one way of pruning the clue set perhaps even ending with different numbers of clues. I'd be grateful for any sage words on that subject too.

Hi David,

I have the process to split puzzles minimal and all n-1 clues puzzles in skmpp, but as far as i know, the best process to fill your request is in gridchecker.

Mladen dobritchev seems occupied these days, but you should ask him through PM to have the right command line.

My process is very bad when the minimal size is far below the given puzzle.

by **dobrichev** » Fri Jan 04, 2013 9:31 am

Hi,

Directly answering to David's question - gridchecker has no reliable code for automatic removal of the redundant clues.
gridchecker --solve --minimals < infile > outfile
outputs the solution for minimal puzzles or first redundant clue for non-minimal puzzles.

As far as i know redundant clues is unexplored territory.

A discussion on this problem would be interesting for me.

Take a valid solution grid and run redundant clue removal algorithm against it. You will achieve all valid minimal puzzles within this grid, which number is huge.

Take a multiple solution puzzle and check whether some of the clues are redundant. Why not?

From some perspective this task is the opposite of generating puzzles by adding clues one by one in a fixed solution grid until reaching uniqueness.
There is theory how to add clues efficiently - McGuire's approach that any new clue should cover some still uncovered unavoidable set.
Obvious rule for removal is that it is safe to remove any clue living in a house/1-rookery with 9 givens. UA sets covered by a single clue fix this clue as non-redundant. Further?

Some additional limitations to the algorithm could make it practical, such as
- limiting the source to a puzzle with unique solution
- limiting the number of redundant clues in the source subgrid (and search if some combinations of them result in unique puzzle too)
- limiting the depth of removal (search for any combination of up to 2 redundant clues)
- limiting the clues to play with (don't touch the clues from a given mask)

For a unique puzzle, gridchecker has an option to generate all puzzles of size up to N, taking as a source data the solution grid, and the forced non-givens. This is efficient only when both N is low and the number of redundant clues is high. Maybe champagne has this in mind referring gridchecker.

by **champagne** » Fri Jan 04, 2013 10:34 am

dobrichev wrote:Hi,
For a unique puzzle, gridchecker has an option to generate all puzzles of size up to N, taking as a source data the solution grid, and the forced non-givens. This is efficient only when both N is low and the number of redundant clues is high. Maybe champagne has this in mind referring gridchecker.

Right because this is the situation we used to compare several processes. (using Mauricio's non minimal puzzle you just listed in the hardest puzzles thread)

What you describe earlier is much closer to what I do with skmpp, splitting the file in 2 lots :

a) puzzles minimal
b) all valid puzzles of n-1 clues if the original puzzle is not minimal.

The second lot has to be processed again to split the file in 2 lots

b1) puzzles minimal with n-1 clues
b2) puzzles valid n-2 clues

......

by **David P Bird** » Fri Jan 04, 2013 11:16 am

Thanks for the reponses guys.

FWIW my thoughts were for a puzzle with N clues remove 1 clue at a time and see if the N-1 puzzle is solvable. If not class the removed clue as essential to the clues set and preserve it from then on.

Now test N-2 puzzles removing all possible pairings of non-essential clues at a time. This will identify any 'conjugate' pairs where one or other of them must be provided, and also those clues that remain non-essential even at the N-2 level.

A simple colouring approach can now be used for the conjugate clue cells to identify any fish patterns, but is unlikely to be conclusive.

Extending this approach at the N-3 level the clue triples removed can't contain both members of a conjugate pair. This will identify triples where two clues have to be provided plus any clues that were still non-essential at the N-3 level.

At this level my limited amateur mathematical skills went into overload!

The idea behind this approach was that where UAs with more than 3 digits existed their influence would be exposed fairly early on but I'm far from sure about this.

Switching horses, if the UAs containing up to 4 digits are located, any clue that is unique to one of them will be essential. That UA can now be removed from the data being analysed. The N-2 level then checks for UAs containing two clues, and so on.

This approach could be simpler but would rely on all UAs with more than 4 digits being satisfied by one of the essential clues or both members of a conjugate pair and therefore poses a risk.

by **dobrichev** » Fri Jan 04, 2013 1:05 pm

Hi David,

I can't get you point.
Do you simply want to collect all unique minimal puzzles which clues are subsets of a given unique non-minimal one?
Or you want to classify interchangeable clues? (1 of 2, 2 of 3, etc.) If so, the result is valid only within the strict context of the rest of the clues and I have doubt that this classification couldn't lead to any generalization like "for grids with xxx propery/pattern the clues at positions abcde are interchangeable in yyy way".

======

Redundant removal algorithm can exploit the property that if at stage 0 some clue is essential, it remains essential at all stages.
So, the non-minimal puzzle could be scanned once for redundant clues, and later only for combinations of the same clues.

Requiring the source puzzle to be unique, the criterion for redundancy is N-1 to have unique solution.
Making a pass over all givens, removing them one at a time and checking for uniqueness, gives a list of X redundant clues.
We can compose a binary word of X bits and investigate which bit combinations are redundant. Let 1 means the clue is cleared, and 0 that the clue is not cleared.
We can compose a boolean array A of 2^X elements. For 8 redundant clues its size is 2^8=256 elements. Each element of the array will hold whether the clue combination of its index has unique solution (1) or non-unique/unknown (0). Index of the array will correspond to the binary combination of the givens.
Trivially, the A[0] is unique (all redundant clues are in place).
Also each element corresponding to a single bit set in the index (1, 2, 4, 8, 16, 32 ...) is unique too.
On the second pass we have to check all pairs, i.e. all indices where the count of the bits is 2.
If a pair is non-unique, then set respective A to 1. Also set to 1 all other A elements that have the same bits in the index set. (removing more clues from non-unique puzzle results in non-unique puzzle too).
On third pass do the same with triplets, skipping already marked as non-unique ones.
On fourth pass process quadruplets, etc.

Now we have a list of clue combinations that are safe to remove, but some of them are possibly still non-minimal.
A puzzle is minimal if there is no sub-puzzle which is minimal.
So make one pass over each element in A marked as unique and check whether there is other unique element which index is superset (i.e. is still unique but with more clues removed). If found, ignore the element, else output it.

============

Technically, removal of clues, one by one, can be done using gridchecker and windows console find command as follows.
Assume we have the source puzzles stored in file x and want to store results in file r.
gridchecker --similar --minus1 < x | gridchecker --solve --minimals > xm1
After this step xm1 file contains puzzles that have one of
- solution if both minimal and unique (no letter "u" in the row, append them to the results)
- "multile solutions" string if non-unique (too many clues removed, ignore)
- "redundant clue" if non-minimal (continue minimizing)
Possible commands for filtering are:
find /V "u" < xm1 >> r
find "red" < xm1 > x
Repeat while x has records.

There will be duplicate records in both xm1 and r files, so they have to be removed periodically or at the end.
The fields of file r are puzzle<tab>size<tab>solution.

by **David P Bird** » Fri Jan 04, 2013 3:02 pm

Hi Mladen,

Although you don't get my explanation and so far I don't get all of yours, it's clear that we are considering processes based on a very similar series of steps.

What aroused my interest is the puzzles being posted each day by ArkieTech in the Puzzles Forum are symmetrical and so may have redundant clues which should usually make them easier. I was then wondering if the different minimal forms that could be extracted from the clue set would share some common key solving deduction(s).

I have used some solving terms in my post such as "conjugate pairs" and "colouring" which may have caused you problems. These are techniques that determine what combinations of digits are possible in the solution to a puzzle. Surprisingly there are strong similarities with determining what combinations of clues are necessary to provide a unique solution. For example when two out of three clues are needed, this is equivalent to an Almost Naked Set in solving terms.

As you know, working on a spreadsheet I can only achieve a fraction of what you can in number-crunching operations, so I try to find ways of working that you might not consider.

While I was eating lunch, it struck me that if we partially solved the puzzle using just the essential clues, any cell that could be resolved that appears in the original clue set will always be redundant. This method could also be applied to shorten the search if we start adding different subsets of non-essential clues.

I have a feeling that using multiple solving runs, any clue that is required to satisfy a UA with > 3 digits should be identified fairly early on as being either essential or one of a pair, but did no trials before I posted my query. If that is right, then it might be productive to switch checking what 2-and 3-digit UAs still had to be satisfied for the N-3 step and beyond.

I appreciate that there is a big knowledge gap between us, but hope perhaps you may find something of interest in these ramblings.

I must now study your response more thoroughly.

by **daj95376** » Fri Jan 04, 2013 4:03 pm

FWIW: gsf's solver may be of help. Here's an input puzzle and the output from his solver. He also reports difficulty based on his solver.

Code: Select all: +-----------------------+ | . . 4 | 5 . 6 | . 9 8 | | . . 2 | 4 . . | . . 7 | | 6 7 5 | . . . | . . 4 | |-------+-------+-------| | 9 5 . | 6 8 . | . . 3 | | . . . | 7 3 . | . . . | | 7 . . | . . 9 | . . . | |-------+-------+-------| | . . . | . . . | 9 . 2 | | 3 . . | . . . | . 4 . | | 5 2 9 | 3 . . | 8 . . | +-----------------------+ ..45.6.9...2.....767........5.68...3...73....7....9.........9.23......4.5293..8.. # 818 FNBTXY C25.m ..45.6.9...24....767.......95.68...3...73....7....9.........9.23......4.52.3..8.. # 550 FNBTXY C26.m ..45.6.98..2.....767.......95.68...3...73....7....9.........9.23......4.52.3..8.. # 276 FNBY C26.m ..45.6.98..24....767.......95.68...3...73....7....9.........9.23......4.52.3..8.. # 546 FNBTXY C27.M/S2.d

Here's the command-line to use in Windows.

sudoku.exe -m input.txt > minimal.txt

by **dobrichev** » Fri Jan 04, 2013 10:28 pm

David P Bird wrote:I have used some solving terms in my post such as "conjugate pairs" and "colouring" which may have caused you problems. These are techniques that determine what combinations of digits are possible in the solution to a puzzle. Surprisingly there are strong similarities with determining what combinations of clues are necessary to provide a unique solution. For example when two out of three clues are needed, this is equivalent to an Almost Naked Set in solving terms.

These terms are not the problem, even if I don't understand them precisely.
I hope you are going to search and eventually explain some hidden logic or some well known logic but from the point of view of redundancy. That would be interesting for me. Is it applicable if some author intentionally adds clues to reach some degree of symmetry?
Searching the vicinity of some puzzle can be done by removal of n clues and addition of m clues, {-n+m}. Intermediate puzzles have multiple solutions. If there is no additional logic what to remove and what and where to add, it is equivalent to do the addition first, and then remove the redundant clues. But you can add clues by applying some logic which targeting something, and then remove the unnecessary ones. You can repeat this process several times, being still in the non-minimal area.

David P Bird wrote:While I was eating lunch, it struck me that if we partially solved the puzzle using just the essential clues, any cell that could be resolved that appears in the original clue set will always be redundant. This method could also be applied to shorten the search if we start adding different subsets of non-essential clues.

The effort to determine potential redundant clues in a unique puzzle is solving the puzzle the number of givens times. More interesting case is when working with multiple solution puzzles. There is some discussion on that matter here.

David P Bird wrote:I have a feeling that using multiple solving runs, any clue that is required to satisfy a UA with > 3 digits should be identified fairly early on as being either essential or one of a pair, but did no trials before I posted my query. If that is right, then it might be productive to switch checking what 2-and 3-digit UAs still had to be satisfied for the N-3 step and beyond.

I am not sure if it is necessary to involve UA. Also I don't understand how the number of digits forming UA is related. Hitting UA with more than 1 clue doesn't require the rest of the clues to contain different digit.

daj95376 wrote:sudoku.exe -m input.txt > minimal.txt

Thank you! This should work well for David.

by **David P Bird** » Sat Jan 05, 2013 12:54 am

Hi again Mladen, thanks for being so patient with me and also providing the gridchecker command lines.

There was a terminology difference between us. The terms I used were:
Essential clues must always be in the clue set.
Redundant clues are those that are never needed.
Non-essential clues are those that are needed in some combinations but not in others.

A better name for the non-essential clues would be floating clues as is used by some solvers. Anyway I now believe I have a better understanding of your previous post.

My idea before was to use N-1 and N-2 solving runs to determine the essential clues and strongly linked pairs of floating clues. After that a switch would be made to what you call a puzzle generating mode. This would ensure that the UAs still to be satisfied were covered using only clues that were in the original puzzle. I can easily produce all the clue sets that give unique solutions, but as yet I can't think of a good way using solving techniques to ensure they are all minimal. If I do I'll let you know.

Sorry, I should have explained my thinking on the UAs with 4+ digits better: any that contain just one or two clues will be picked up in the N-1 or N-2 solving passes. The others that contain 3+ clues are then highly unlikely to be left completely uncovered by the reductions made in the following steps just based on the 2 and 3-digit UAs. If by remote chance that did occur it would be revealed by a final solving pass.

dobrichev wrote:[
David P Bird wrote:While I was eating lunch, it struck me that if we partially solved the puzzle using just the essential clues, any cell that could be resolved that appears in the original clue set will always be redundant. This method could also be applied to shorten the search if we start adding different subsets of non-essential clues.

The effort to determine potential redundant clues in a unique puzzle is solving the puzzle the number of givens times.

I'm not sure that you understood my point, and I'm afraid I can't understand your reply. I'll rephrase and perhaps you can too.
If we attempt to solve the puzzle using just the essential clues we will manage to solve some cells and be left with multiple ways of solving the others. If one of the cells we solve contains a floating clue then that clue will never be needed and will be redundant.

Yes, thank you daj for info – it was what I was originally trying to find before I got embroiled in the theory.

by **David P Bird** » Sat Jan 05, 2013 12:44 pm

Here's my best effort so far for determining subsets of clues that give a minimal unique solution.

Determine whether each clue is essential or floating by test solving the puzzle with one clue removed at a time (N-1).
Determine strongly linked pairs of floating pairs of clues by test solving with every pair of floating clues removed (N-2).

A logic table is now required.
The rows are the 2 & 3 digit UAs that don't contain an essential clue
The columns are the cell addresses of the floating clues.

Identify any UAs that contain just a pair of floating clues and move them to the top rows of the table
If any strongly linked pair is missing it will be contained in a UA with >3 digits. The identity of this UA is not important but a row should be inserted to represent it.

The remaining rows are then sorted so that the UAs containing the lowest number of floating clues are at the top.

A nested loop is now started
Repeat
.. Find the first UA that has no given set in it so far.
.. Loop through the floating clues it contains setting one at a time.
.. This will then satisfy some other UAs below it in the table
.. If it sets a further clue in an UA that appears above it in the table, flag the other set clue(s) in that UA for later checking
Until all UAs contain a set clue or there are no more floating clues in the first UA.
The set clues will nearly always produce a unique solution but it may not be minimal

When a solution isn't minimal one or more of the flagged clues will cover UAs that all have at least one other clue set (a fairly quick test to make). When that happens any such clue isn't needed and can be removed.

Depending on how that logic is followed, it's possible that this procedure could remove all the set clues in a UA. This should be checked and when it happens, there will be more than one minimal solution depending on which clue is added back in the affected UA.
Without writing the code and testing it, this the area that I fear could prove tricky when there is more than one UA affected. This may mean the system may only be suitable for starting clue sets that are reasonably close to being minimal.

A final check with a solver should be made with the set clues to see if there is a unique solution.
If the solution isn't unique one or more of the cells that can't be solved will be a clue that has been un-set by the minimality check.

This doesn't use any specific solving technique though!

by **dobrichev** » Sat Jan 05, 2013 9:16 pm

Hi David,

dobrichev wrote:The effort to determine potential redundant clues in a unique puzzle is solving the puzzle the number of givens times.

Applying {-1} to a puzzle with N givens results in N sub-puzzles.
If the initial puzzle is unique, then solve each of these N sub-puzzles once and determine whether it has unique solution or not. Those with unique solution point that the removed clue is redundant. Those with multiple solutions point that the removed clue is essential.
If we are doing minimization, then we have nothing more to do with the essential clues. The rest of the work is to check if there exist combinations of the previously obtained redundant clues which result in unique solution too.

In order to find all minimal puzzles which clues are subset of the original, I described a possible method by removal single clues, then pairs, then triplets, etc. This is somehow top-down process where we are playing with unique solution (always the same as the solution of the original puzzle) and stopping immediately when moving to multiple solutions.
There is bottom-up process where we can simultaneously remove all redundant clues and then add combinations of them until uniqueness is reached.
In your latest post you describe some hybrid process that follows top-down up to pairs. At this point I am losing the track. For example I can't understand how many times you are creating "a logic table" and how many times you are doing the nested loops.
Do you remove all redundant clues at once and then with the help of UA you are finding all combinations of non-essential clues which result in unique puzzles? If so, then your approach is very similar to this implemented in gridchecker.
In --scan mode, gridchecker has --cluemask option where 81-char mask is given with 0 for forced non-given, 1 for forced given, and anything else for "floating". The solution grid must be given too. Unfortunately, the number of clues in the generated puzzles must be given too, so you must run the process several times for increasing number of givens.
If the number of forced givens is large enough to result in a pseudo-puzzle with less than 1 million solutions, than it is reasonable to find all unhit UA. Else some subset of them is used and this is controlled by other command line parameters (unless I preffered to hardcode some parameters for this case, which is quite possible).
Note that in this process only N-1 pass must be done to obtain the "floating" clues in the cluemask. Esential ones are set to 1, and non-givens are set to 0.
I added this feature to gridchecker when processing this non-minimal puzzle.
I can look at the source code, explain the details, and possibly do some modifications if there is interest.

by **David P Bird** » Sun Jan 06, 2013 12:42 am

Hi Mladen,

Up to the N-2 solving pass my method is the same as yours.

Thereafter I switch to a 'generating mode' listing the 2 and 3-digit UAs that have to be satisfied by floating clues.
In the nested loops after the first way of covering them has been found and logged, the routine goes back to the previous UA where a there was a choice of clue to set.
The previous clue choice is un-set together with any knock-on effects that choice caused, the next clue option is set instead, and the looping system continued to find the next possible clue set.

Now if I understand your method correctly, it repeatedly runs a solver as it progressively tests smaller and smaller subsets of clues picking only from those that were viable at the previous size. If any of the larger clue sets then contains a smaller one as a subset it is then discarded as being non-minimal, which is a nice and easy way to handle that problem.

Using my 'generating' mode, there no control over the size of the clue set that will be found in any pass.
It therefore has to use a simple minimality check is to see if any clue is always accompanied by another one in the UAs it occupies. If so that that clue can be removed.
When there are several such clues occurring at the same time, there will be a sub-problem about what different combinations of them can be removed to produce different minimal clue sets. I haven't investigated this, but as I said, making this water-tight could be tricky to code. However this can only happen when the original puzzle is far from being minimal.

Your procedure should therefore be more robust when many clues can be removed, but I think mine would be quicker when the starting clue set is reasonably close to minimal.

From what you write, I now have a vague understanding of your –scan mode with the –cluemask option and can appreciate that the N-1 pass is the only one that is required to find any essential clues that would be forced by the clue mask.

The only reason I want to run a N-2 pass as well is to detect any multi-digit UAs that contain two givens. This means I can then limit the UAs to be listed to those containing 2 and 3 digits with reasonable confidence.

As I wrote before in a thread you gave a link to, 'In the same way as there are theoretical physicists, consider me as a theoretical Sudokoist – I consider these things but if I can't formulate them in a spreadsheet that’s where they stop.' I guess that's where I am now!

by **David P Bird** » Sun Jan 06, 2013 11:19 pm

Mladen, here is another idea using recursion for dealing with clue sets that are far away from being minimal. I hope that this outline gives you enough information to understand the general principles.

[Edit] The original post has been replaced by this re-write as a result of some further analysis.

No solving passes are needed. Instead a static logic table is composed for all the 2-digit UAs as previously described.
When these are sorted in order of the number of potential clues they contain, the first ones in the list will be those that only contain one clue which will identify the essential clues.
These will automatically be covered first by the generating algorithm so require no special treatment*.

A nested loop generating algorithm systematically finds all the ways each UA can be assigned a clue from the original clue set.
Each clue set found is tested to see if it produces a unique solution.

If a unique solution is found, a redundancy check is made for any redundant clues. These are the ones that only appear in UAs together with some other clue.
If there are no redundant clues, the current clue set is logged as being minimal. Otherwise the result is ignored as any minimal subsets it contains will be found in later passes.

If multiple solutions are found the current clue set is too small because one or more UAs with more than 2 digits have been left uncovered. In this case a virtual UA consisting of all the unused clues is inserted into the static logic table. The clues selected in the UAs above this in the table are retained and the ones below it are blanked. This will cause the nested loop system to cover the new virtual next before it proceeds to cover the remaining ones as it did before. This system recurses until a unique solution is found when it is handled normally, and control is passed back to the nested loops to look for further possible clue sets.
---------------

The affect of adding virtual UAs to the static UA table is interesting because it means that the system progressively learns from previous failures. The first virtual UA will probably not exist in practice, but each time this system recurses the virtual UAs get smaller until the critical ones are identified.

One way to mentally picture what this process does is to consider that a UA exists that contains just one essential clue which isn't covered at first. When the multiple solutions are found this clue will be added back together with all the other unused clues in the virtual UA. Every time this clue is left uncovered, the virtual clue set that is added will get smaller until the final one will reduce to that single clue.
------------------

Anyone who read the original version of this post will know that at first I thought that recursion could be used to find the minimal subsets that exist when a non-minimal cover set was found. On reflection however I think that all this would do would be to produce duplicates of solutions that would be found later.
------------------

*As and when essential clues are located it's possible to prune the table so that any UAs containing them are removed. This would also eliminate any clues that only existed together with one of these essential clues (and hence would be always be redundant). Although this wouldn't effect the results, it would reduce the loop count in the generating code and make it run faster.

Although it would also appear worthwhile to prune redundant virtual UAs too, the time saving would be minimal. This is because by the time it's clear they are in fact redundant, they will never trigger the addition of a clue to the current clue set.
------------------

I believe this system should handle clue sets that are a long way from being minimal but the number of alternative clue sets it will produce will be enormous. These could be limited by using a version of your clue masking system
------------------

Out of this analysis I have realised that this theorem;
'In a minimal puzzle, each clue will be the only one in at least one UA'
is blindingly obvious but I've never seen it mentioned before.

However I'm not yet sure if the corollary
'if every clue in a puzzle is unique to at least one UA, that puzzle is minimal'
holds. If not, a final check is needed to eliminate any clue set that completely contains a smaller one.

The New Sudoku Players' Forum

Stripping Out Redundant Clues

Stripping Out Redundant Clues

Re: Stripping Out Redundant Clues

Re: Stripping Out Redundant Clues

Re: Stripping Out Redundant Clues

Re: Stripping Out Redundant Clues

Re: Stripping Out Redundant Clues

Re: Stripping Out Redundant Clues

Re: Stripping Out Redundant Clues

Re: Stripping Out Redundant Clues

Re: Stripping Out Redundant Clues

Re: Stripping Out Redundant Clues

Re: Stripping Out Redundant Clues

Re: Stripping Out Redundant Clues

Re: Stripping Out Redundant Clues

Re: Stripping Out Redundant Clues