Stripping Out Redundant Clues

Everything about Sudoku that doesn't fit in one of the other sections

Re: Stripping Out Redundant Clues

Postby dobrichev » Mon Jan 07, 2013 11:00 am

David P Bird wrote:No solving passes are needed. Instead a static logic table is composed for all the 2-digit UAs as previously described.

The previously described 2-digits UA are formed after N-1 and N-2 solving passes, aren't they?
Actually I have no idea how exactly you will obtain them.
David P Bird wrote:When these are sorted in order of the number of potential clues they contain, the first ones in the list will be those that only contain one clue which will identify the essential clues.
These will automatically be covered first by the generating algorithm so require no special treatment*.
----
As and when essential clues are located it's possible to prune the table so that any UAs containing them are removed. This would also eliminate any clues that only existed together with one of these essential clues (and hence would be always be redundant). Although this wouldn't effect the results, it would reduce the loop count in the generating code and make it run faster.

You might want to remove the forbidden clues (those not in the initial puzzle) from all UA. This results in less branches later. This also results that "the number of the potential clues they contain" is the UA size.
David P Bird wrote:If a unique solution is found, a redundancy check is made for any redundant clues.

There is no way to achieve redundant clue if each of the added clues hits a UA not previously hit by other clue. [Edit: this is wrong]
David P Bird wrote:If multiple solutions are found the current clue set is too small because one or more UAs with more than 2 digits have been left uncovered. In this case a virtual UA consisting of all the unused clues is inserted into the static logic table.

Such virtual UA are nothing more than non-minimal UA not previously found (such as 4-digits ones). I am not sure they would help. Moreover, why you want to add them to the static table being sure you are hitting them immediately so they must disappear from the same table(or somehow marked as already hit)? Maybe an effort to achieve more minimal UA at the start will be effective.
David P Bird wrote:I believe this system should handle clue sets that are a long way from being minimal but the number of alternative clue sets it will produce will be enormous. These could be limited by using a version of your clue masking system.

The masking system does exactly the same - never remove forced givens (essential clues) and never add forced non-givens (the non-givens in the initial non-minimal puzzle).

Now some theorems from me :lol:
1) Starting from a fixed solution grid and set of forced non-givens, every essential clue will hit a UA of size 1 if the forced non-givens are removed from the UA too.

2) The rules for MCN (maximal clique number, the number of mutually disjoint UA) are applicable to the reduced-size UA too.

Essentially you approach is a puzzle generator with fixed solution grid.
You might want to read carefully McGuire's article about "no 16-s exist". All these problems are discussed and somehow solved in his work.
The concept of "dead clues" is very powerful too. It avoids adding clues AB twice but in different order. It avoids any duplicates up to this part of the logic where known UA have to be hit.
For large search space (less "essential" clues, less forced non-givens, and more "floating" clues), all short UA for the grid could be found in the milliseconds by using McGuire's method, and later pruned and truncated.
For efficient algorithm of hitting the UA you may see Oracle's concept of bitmap indexes, used by McGuire (and gridchecker).
Last edited by dobrichev on Mon Jan 07, 2013 3:41 pm, edited 1 time in total.
dobrichev
2016 Supporter
 
Posts: 1316
Joined: 24 May 2010

Re: Stripping Out Redundant Clues

Postby David P Bird » Mon Jan 07, 2013 2:26 pm

Mladen, many thanks for such a comprehensive response.

The starting point is a solution grid for the puzzle together with the clue set originally provided.

In my latest scheme I make no solving passes at all, but simply locate all the 2-digit UAs in the solution grid (a fairly trivial task).
As these are copied into the UA table, they are masked to remove the contents of non-clue cells in the original puzzle.
The sorting this table by occupied cell count brings the essential clues to the top of the table.
As this stage removing any UA containing an essential clue plus any other clue would speed up loop generating code.

dobrichev wrote:There is no way to achieve redundant clue if each of the added clues hits a UA not previously hit by other clue.

What happens under my scheme is that the clues added to cover later UAs will also cover earlier ones, so that it is possible that a clue designed to hit just one UA is eventually joined by others in every UA containing it. In that situation it is then redundant. Perhaps McGuire manages to avoid this situation which is something I have been wondering about.

dobrichev wrote:Such virtual UA are nothing more than non-minimal UA not previously found (such as 4-digits ones). I am not sure they would help. Moreover, why you want to add them to the static table being sure you are hitting them immediately so they must disappear from the same table(or somehow marked as already hit)? Maybe an effort to achieve more minimal UA at the start will be effective.


A virtual UA once created is never deleted. There may be more than one critical UA that hasn't been covered and the first virtual UA may cover all of them. As later passes cause further virtual UAs to be inserted higher up in the table and so be given higher priority by the generator. When all the clues in a virtual UA are also covered by higher priority ones, that UA will be redundant because the solver will have it marked as already satisfied whenever it reaches it. The only time that is saved by removing it is the cycles it takes to mark it and suspect it would cost a lot more to keep running checks to see if it can be deleted.

When a virtual UA is created the digits it contains can be determined, from which perhaps you (but not I) could find the actual UAs involved. I agree that extending the starting UAs up to say those with 4 digits would save running time, but at some point there must be a cross-over between the preparation time spent and the running time saved. That must also depend on the how away from being minimal your average puzzle would be. You would gain a lot more from this than I would I think.

dobrichev wrote:Now some theorems from me
1) Starting from a fixed solution grid and set of forced non-givens, every essential clue will hit a UA of size 1 if the forced non-givens are removed from the UA too.

That’s the same as mine, but what about the corollary?

I vaguely remember trying to follow a description of maximum clique number but it lost me. I'll try to find McGuire's article, as the technique to avoid duplicating a clue pair sounds interesting – thanks.
David P Bird
2010 Supporter
 
Posts: 960
Joined: 16 September 2008
Location: Middle England

Re: Stripping Out Redundant Clues

Postby coloin » Mon Jan 07, 2013 3:35 pm

Hi David and Mladen.

I had trouble understanding the "floating" clues ....... as i understood all the clues were "essential"

However what David must be refering to is a term we gave to clues as untouchable and mutable clues.

Thus a puzzle which has no mutable clues within {-1+1} was termed an untouchable. A puzzle was eventually found where all the clues were mutable [a chameleon]

A mutable clue is one where another clue value can be inserted in that position which gives rise to a valid [minimal or non-minimal] different or same puzzle.

A clue which can me removed and another valid puzzle found by inserting a clue somewhere else might well be a "floating" clue ?

David - i have the only original copy of Havards software.

I cant think after 4 years he would mind and i can email you a copy.

Its a GUI and it displays minimal puzzles [untouchable clues in red,mutable clues in green and floating clues in blue. If its non minimal all essential clues are in blue and redundant clues in black.

A {-1+1} function is lightning fast and rids isomorphs on the fly.

Oh and performing a {-1+0} on a few thousand puzzles strips out the redundants in seconds.

pm if you want it

C
Last edited by coloin on Mon Jan 07, 2013 4:32 pm, edited 1 time in total.
coloin
 
Posts: 1638
Joined: 05 May 2005

Re: Stripping Out Redundant Clues

Postby dobrichev » Mon Jan 07, 2013 3:37 pm

David P Bird wrote:The starting point is a solution grid for the puzzle together with the clue set originally provided.

This has still something in common with the title of the topic. To me it is a puzzle generation with forced non-givens. Yes, this is a kind of minimization, maybe focused on plenty of redundant clues.
David P Bird wrote:
dobrichev wrote:There is no way to achieve redundant clue if each of the added clues hits a UA not previously hit by other clue.

What happens under my scheme is that the clues added to cover later UAs will also cover earlier ones, so that it is possible that a clue designed to hit just one UA is eventually joined by others in every UA containing it. In that situation it is then redundant. Perhaps McGuire manages to avoid this situation which is something I have been wondering about.

Agree. You are right, it was my mistake.
McGuire hunts low-clue puzzles where redundancy is not a problem.

David P Bird wrote:That’s the same as mine, but what about the corollary?

Work with short sized UA sets. They are lighthouses pointing to unavoidable givens.

David P Bird wrote:I vaguely remember trying to follow a description of maximum clique number but it lost me. I'll try to find McGuire's article, as the technique to avoid duplicating a clue pair sounds interesting – thanks.

McGuire demonstrated that MCN doesn't help. Calculating MCN is slow process and could serve as estimation of the lower limit of the clues or the required processing time. He starts from by hitting, one clue at a time, the shortest UA. Marks all UA hit by this clue. Continues with the shortest unhit UA, etc. When backtracking, it adds the tested clue to the forbidden ones (such are forced non-givens in the context of our discussion) and never touches it again. That are "dead clues". What he doesn't do is to clear the dead clues from the UA when checking which is shortest.
Aggregates of mutually disjoint UA that require respectively 2, 3, 4, etc. clues couldn't help in puzzle minimization where we have no upper limit of the number of the clues. For fixed number of clues their optimal usage depends of many parameters which McGuire fine tuned spending millions of CPU hours. Obtained parameters are applicable for 16-clue search only. The purpose of the aggregates is an early stop when we have upper limit of the number of the clues and see we have no chance to hit all UA.
This is a link to the article.

How to minimize a UA set? Compose a puzzle with everything but UA given. Obtain all solutions and compare them to each other. Each difference between 2 solutions is UA, but not necessary minimal. On a second pass check each UA whether there exist another one which positions are subset of the current. Remove the current if smaller UA is found. Done. Of course many optimizations are possible.
dobrichev
2016 Supporter
 
Posts: 1316
Joined: 24 May 2010

Re: Stripping Out Redundant Clues

Postby dobrichev » Mon Jan 07, 2013 3:57 pm

coloin wrote:I had trouble understanding the "floating" clues ....... as i understood all the clues were "essential"

Hi coloin,
We are discussing minimization of a given single-solution puzzle. This means that the solution is fixed so we can't replace any clue value. Also we can't add clues outside of the given non-minimal puzzle.
It is possible some of the clues to be "essential", that is when only one of them is removed from the original, the puzzle becomes multiple-solution. So we have nothing to do with these clues in the minimization process, but left them as is. The others, which, when removed, leave the puzzle still unique, could be removed either one by one or in combinations. They are "floating". I don't pretend this is precise terminology and not persist to use it. It arose during this discussion.
After minimization all the minimal puzzles consist of only "essential" clues, as you said. But out of the context of the original non-minimal puzzle.
dobrichev
2016 Supporter
 
Posts: 1316
Joined: 24 May 2010

Re: Stripping Out Redundant Clues

Postby coloin » Mon Jan 07, 2013 5:27 pm

I understand now

.......l a non-minimal puzzle may well have 2 floating clues each of which becomes essential when the other is removed.
potentially the non-minimal puzzles will be harder than the non-minimal "mother" puzzle.
one reason i feel that searching for ultra hard non-minimal puzzles - might just give us an even harder minimal.

dukuso wrote a program to add clues to a subpuzzle within a single grid solution.
http://magictour.free.fr/suexmult.exe
with a mask on top of a text string. unfort it doesnt work in 64 bit computors.

C
coloin
 
Posts: 1638
Joined: 05 May 2005

Re: Stripping Out Redundant Clues

Postby David P Bird » Mon Jan 07, 2013 10:50 pm

dobrichev wrote:When backtracking, it [McGuire's approach] adds the tested clue to the forbidden ones (such are forced non-givens in the context of our discussion) and never touches it again. That are "dead clues". What he doesn't do is to clear the dead clues from the UA when checking which is shortest.

Here's some quick responsed to these points which might change after I've slept on them:

I understand the mechanisms and think the first one would definitely be appropriate for me too if the virtual UAs as weren't being added as and when they were needed. First thoughts are that it should be OK, but I'm not completely sure yet if it's risk-free. I need to consider that with a clear head.

I think the reason he doesn't change the order the UAs are covered as dead clues are removed is to avoid having to re-run some clue sets he's already tried. That's the effect of resetting all the loop contols for the UAs coming after it when it's promoted in the order. It's what happens when a virtual clue set is inserted into the table in my scheme where the penalty is small, but for his work it could be considerable.

That reminds me, I forgot to include a final duplicate solution check to allow for that in my description.

dobrichev wrote:How to minimize a UA set? Compose a puzzle with everything but UA given. Obtain all solutions and compare them to each other. Each difference between 2 solutions is UA, but not necessary minimal. On a second pass check each UA whether there exist another one which positions are subset of the current. Remove the current if smaller UA is found. Done. Of course many optimizations are possible.

Although I wasn't aware of this procedure. I take it that you are considering that as an option with good propects here
David P Bird wrote:When a virtual UA is created the digits it contains can be determined, from which perhaps you (but not I) could find the actual UAs involved.

BTW In English 'floating voters' are those who don't support any political party and who may switch the way they vote.

coloin Thanks for your input and software offer, I'll be in touch. Mladen's response to you covers everything I would want to say.
David P Bird
2010 Supporter
 
Posts: 960
Joined: 16 September 2008
Location: Middle England

Re: Stripping Out Redundant Clues

Postby David P Bird » Thu Jan 10, 2013 11:22 pm

Finally I'm ready to describe my latest approach to the problem of finding minimum subsets of clues.

Turning to the issue that ideally no clue set should be found twice by the generator, the dead clue concept can now be explored. In McGuire's approach he considers the unselected clues in covered UAs to be dead which denies their selection in any subsequent UA. However his search was for 16 clue puzzles, not for minimal clue sets, where the dead clue concept must be reversed. Therefore a dead clue is one where once all solutions including it have been found so it should not be used again. (I believe this is what you were saying too MD.)

UAs are sorted in ascending order of the number of clue cells they contain and any that contain the same clues as another as a subset are deleted.
Clues are ranked in descending order of the number of times they occur in the UAs, with a secondary ranking criterion being the minimum number of clues in any UA they occupy in ascending order.
The highest ranked clue becomes the target dead clue as this will reduce the search space the most when it can be eliminated. This is therefore set first.
All possible combinations for covering the remaining uncovered UAs are explored and extra virtual UAs added as necessary until every clue set that provides a unique solution has been found. Only those with no redundant clues are recorded.
When this search is complete, the top ranked clue that was set first can be eliminated from the starting clue set, the UAs re-ordered and the process repeated using the next highest ranked clue.
This is a valid approach as in an exhaustive search every way each clue can be included in a minimum set must be explored.
The search terminates when the remaining active clues don't provide a unique solution when checked.

As long as there are sufficient clues to produce a unique solution, every search must eventually produce a successful result. Once one is found, that branch is terminated as adding further clues to the set will only produce non-minimal sets. If a branch has covered all the listed UAs without success then Virtual UAs (VUAs) are progressively added until a successful clue set is found.

The first VUA to be added will simply consist of all the unused clues. These are added to the partial clue set one at a time to see if that will produce a solution. If that fails then the at least two additional clues are needed and so on. To handle this, extra VUAs are created consisting of all the clues in the previous VUA less the first one. To prevent the same combination of clues being tried twice each each VUA index value starts at the position one greater in the unused clue order than the previous one. Sooner or later this will find a solution.

Now each of the VUAs will contain different minimal UAs as subsets. The clues tested before the successful one in each of these can't belong the contained UA otherwise the solution would have been found earlier and so they can be removed. The clues following the successful one may also be in it's contained UA and may even be common with those in the other contained UAs. Working with each VUA in turn, the other clues it contains are tested to see if they produce a unique solution in combination with the successful clues in the other VUAs. Those that don't aren't in the same minimal UA and can be deleted. Each VUA is therefore converted to a minimal UA which can be sorted into order and added to the end of the UA table. In all but the last one the loop index is set to the first of the clue options, while in the last one it is set to the final option.

Control is then returned to the generator loop control to index on the penultimate UA and continue the search. This may then may need further VUAs to be generated.

Adding the new minimal UAs to the bottom of the table rather than sorting into position prevents needlessly repeating the same subset of clues. As and when the final selection for a UA has been made and the next higher loop will be indexed, the opportunity can be taken to re-order all the lower ranked UAs to steadily bring any new UAs into their proper positions.

Although using the dead clue concept will prevent most sets from being tested more than once, it won't stop all of them. Consider two clues 'a' and 'b' that exist in a UA. At the start the generator may select 'a' to cover that UA but further down the covering order 'b' may also be selected to cover a lower-ranked UA, so both of them are used together. It's then possible that when the generator indexes to select 'b' in the first UA that 'a' is also selected later to cover another lower ranked UA, so they occur together again. This is not a problem if the other clues selected are different but that can't be guaranteed. So, once all the possible clue sets have been listed, a final check is required to see if any are duplicated or contain any smaller one as a subset.
David P Bird
2010 Supporter
 
Posts: 960
Joined: 16 September 2008
Location: Middle England

Re: Stripping Out Redundant Clues

Postby dobrichev » Fri Jan 11, 2013 7:46 am

Hi David,

There are few things looking incorrect which I found at first sight.
David P Bird wrote:Turning to the issue that ideally no clue set should be found twice by the generator, the dead clue concept can now be explored. In McGuire's approach he considers the unselected clues in covered UAs to be dead which denies their selection in any subsequent UA. However his search was for 16 clue puzzles, not for minimal clue sets, where the dead clue concept must be reversed. Therefore a dead clue is one where once all solutions including it have been found so it should not be used again. (I believe this is what you were saying too MD.)

No. McGuire's approach is exactly as yours.
Rephrasing:
1) Create context with previously givens, non-givens, and dead clues.
2) Choose a non-given cell.
3) Investigate all possible puzzles within the context with this cell as a given.
4) Investigate all possible puzzles within the context with this cell as non-given.

After step 3, we can safely forbid the chosen cell in all operations in step 4. Steps 3 + 4 do the exhaustive search within the context.

Apply this algorithm recursively.
You can start with empty givens and dead clues and choose, one by one, all clues from a short UA. The first step is largest. On the second step you have already the clue from the first step (and only this clue) marked as "dead". On third step you have 2 dead clues. If the chosen UA size is 4, then on 4-th step (with 3 dead clues) you done the whole search.

What misleads you is maybe the fact that the clue is "dead" only within the context (and all inherited contexts while recursing in depth). At backtracking, simply the "parent" dead clues took effect.

David P Bird wrote:UAs are sorted in ascending order of the number of clue cells they contain and any that contain the same clues as another as a subset are deleted.
Clues are ranked in descending order of the number of times they occur in the UAs, with a secondary ranking criterion being the minimum number of clues in any UA they occupy in ascending order.
The highest ranked clue becomes the target dead clue as this will reduce the search space the most when it can be eliminated. This is therefore set first.
All possible combinations for covering the remaining uncovered UAs are explored and extra virtual UAs added as necessary until every clue set that provides a unique solution has been found. Only those with no redundant clues are recorded.
When this search is complete, the top ranked clue that was set first can be eliminated from the starting clue set, the UAs re-ordered and the process repeated using the next highest ranked clue.
This is a valid approach as in an exhaustive search every way each clue can be included in a minimum set must be explored.

The (reduced) size of the unhit UA should take precedence over the number of UA that include the respective cell. Large sized UA don't help.
Large number of UA flatness the distribution of UA per clue. In low-clue puzzles, the small differences in UA per clue don't promote the clue to be part of the puzzle.
IMO you should forget about counting how many UA have the particular clue.

David P Bird wrote:As long as there are sufficient clues to produce a unique solution, every search must eventually produce a successful result. Once one is found, that branch is terminated as adding further clues to the set will only produce non-minimal sets. If a branch has covered all the listed UAs without success then Virtual UAs (VUAs) are progressively added until a successful clue set is found.

Working with several hundreds of shortest UA sets on an empty grid truncates the search space sufficiently.
The still unhit UA would add value probably only within this thin context.
Since the number of clues is not limited, it wouldn't be efficient to run a solver over all clue combinations on "live" clues and so some help with "virtual UA" is needed. Dealing with the newly found UA outside of the context probably wouldn't help so I recommend keeping them locally and forgetting about them at backtracking.

David P Bird wrote:Although using the dead clue concept will prevent most sets from being tested more than once, it won't stop all of them. Consider two clues 'a' and 'b' that exist in a UA. At the start the generator may select 'a' to cover that UA but further down the covering order 'b' may also be selected to cover a lower-ranked UA, so both of them are used together. It's then possible that when the generator indexes to select 'b' in the first UA that 'a' is also selected later to cover another lower ranked UA, so they occur together again. This is not a problem if the other clues selected are different but that can't be guaranteed. So, once all the possible clue sets have been listed, a final check is required to see if any are duplicated or contain any smaller one as a subset.

Duplication shouldn't happen. Please reconsider after understanding the rephrase at the beginning of this post.
A possible duplication could arose if at the inner iterations some other algorithm, which doesn't take into account the dead clues, is used.

What are the differences between minimization task and McGuire's search?
1) There are no upper limit for the number of givens.
2) There are forced non-givens so the initial context contains "dead clues".
3) Optionally, there are forced givens so the initial context contains givens (if N-1 is done as a preliminary step).
4) If by some reason "dead clues" concept is broken somewhere in the searching process, a check for minimality must be done at some stage.

Reading McGuire's article again with this in mind could answer many questions.

Cheers,
MD
dobrichev
2016 Supporter
 
Posts: 1316
Joined: 24 May 2010

Re: Stripping Out Redundant Clues

Postby David P Bird » Fri Jan 11, 2013 11:49 am

Hi Mladen, many thanks for your critique.

On dead clues, you caused me to re-read McGuire's description:

'The basic idea is that whenever we add a clue to the hitting set from an unavoidable set, we consider all smaller clues from that set dead, i.e., we exclude these smaller clues from the search (in the respective branch of the search tree only).'

I interpreted smaller clues as other clues, but what he meant was any previously selected clue in the same UA as he equates a clue's size to be its cell number(!). As he covers each un-hit UA, he extends the list of clues that can't be selected in all the later UAs in the table. This would cause him to back-track as soon as he found an un-hit UA composed only of dead clues.

Now to explain my thinking:

Remembering the aim is to find all minimal clues sets contained in a non-minimal one, I never want to backtrack until I've found a clue set that provides a unique solution. This forces the creation of virtual UAs which are subsequently converted to minimal UAs. The system therefore identifies the critical higher order UAs out of the multitudes that exist.

Progress would be slow at first but would accelerate nicely as the higher order UAs involved are identified and added to the table. Setting the clue that appears in most UAs as the first dead clue target I thought would cause these UAs to be identified early in the process, but perhaps you're right and I should let them be found as and when necessary.

As the eventual list of minimal clue sets would be relatively small, I thought a final duplicate solution check would be a minor price to pay in comparison to maintaining multiple dead clue records. Now I properly understand McGuires dead clue approach perhaps I should re-examine that.

Re-reading your post, I get the impression that you keep forgetting the title of this thread.
David P Bird
2010 Supporter
 
Posts: 960
Joined: 16 September 2008
Location: Middle England

Re: Stripping Out Redundant Clues

Postby David P Bird » Sat Jan 12, 2013 12:46 pm

dobrichev wrote:Dealing with the newly found UA outside of the context probably wouldn't help so I recommend keeping them locally and forgetting about them at backtracking.

I have difficulty in believing that every new UA found this way is will only play a role in the context it was found necessary. What you suggest may apply later on in the process when they are only found one at a time though.

In thinking this over, I realised I had missed a trick that could have a big part to play. This is when the clues in newly found UA are a subset of those in a previous UA and the larger UA can be deleted as soon as it isn't one of the set being indexed by the generator. This presents a slight risk of duplicating a result if one of the clues in that UA turns out to be a member of a second minimal UA also contained by that UA.

My thinking about maintaining different sets of dead clues at each sub-branch remains that the cost of ensuring that situations with a small risk of occurring never happen is probably higher than checking for any cases where they had. (I take a similar view in the case of deleting redundant UAs)

The reason
I wrote: Re-reading your post, I get the impression that you keep forgetting the title of this thread.
was because of this
dobrichev wrote:Since the number of clues is not limited, ...
Yesterday I thought you were back on finding a clue set for a new solution grid, as when we start with a non-minimal clue set that establishes the limit. However on re-reading today, I appreciate you meant that that unlike McGuire's approach which would stop at 16 clues, the approach here could stop at any number of clues up to the number in the original set. But as I said, each branch terminates as soon as a successful set has been found which will always happen and will identify the missing crucial UAs which I'd like to find as early on as possible.

I accept that when all the UAs that have been collected are covered, the method you outlined may be better than my 'virtual UAs' to discover what further UAs should be added to the list, and could be used instead.

This has been an interesting learning experience for me, and I'm very grateful to you for providing me with the brain food I needed for my thought experiment. I'm sorry that I've been unable to contribute anything that you might find useful.
David P Bird
2010 Supporter
 
Posts: 960
Joined: 16 September 2008
Location: Middle England

Re: Stripping Out Redundant Clues

Postby dobrichev » Sat May 18, 2013 12:09 pm

Hi,

I coded a redundant clues removal tool.

input: list of puzzles, possibly having redundant clues
output: list of unique minimal puzzles

I am applying 2 different methods:
- top-down, for "floating" givens <= 9, 0-5 seconds/puzzle. Probably could be extended to ~20 floating givens.
- bottom-up, a generation based on hitting the UA sets, 2-15 minutes/puzzle for the few tests I have done.

Non-floating (or forced) givens are those which when removed from the initial puzzle, one at a time, result in non-unique puzzle and therefore don't participate in the later removal process.

Essentially the methods are based on the discussion in this thread.

If somebody is interested, I can share the source code and the binary (next release of gridchecker).

Cheers,
Mladen
dobrichev
2016 Supporter
 
Posts: 1316
Joined: 24 May 2010

Re: Stripping Out Redundant Clues

Postby David P Bird » Sat May 18, 2013 2:36 pm

Hi Mladen,

This sounds very good!

How many floating givens can it handle at the moment?

Thinking about how any more than that could be handled, one idea might be to allow the user to identify digits whose givens should be considered as forced.

I hope you'll include it soon in Gridchecker v 1.16!

David
David P Bird
2010 Supporter
 
Posts: 960
Joined: 16 September 2008
Location: Middle England

Re: Stripping Out Redundant Clues

Postby dobrichev » Thu May 30, 2013 1:27 pm

The latest Win32 executable is at http://sites.google.com/site/dobrichev/.

gridchecker --similar --removeredundant [--verbose] < infile > outfile

Timings.
Machine Intel Core i3 CPU 550 @ 3.20GHz, 4GB RAM, Windows 64-bit, Application is 32-bit.

Solution grid is the one with most puzzles (294) in the recent version of the hardest collection.
123456789457189236689372154268794315391265478574813692732948561815627943946531827

The initial non-minimal puzzle consist of union of the 40 givens within all puzzles from the collection.
1.3.5678..5..8..366..3.2....6..9431539..65..85.4..369..3...8.6.8.56....39.6.3.8..
The rest non-minimals are composed by adding or removing one random given at a time.
Code: Select all
input non-minimal puzzle                                                         givens=fixed+floating method  seconds    seconds  minimals
                                                                                                              1 thread  4 threads     found
1.3.5.78..5..8..36...3.2....6..9.3.5.9..65..85.4..3.9..3...8.6.8.56....39.6.3.8..  34     17     17      mark        2                  128
1.3.5.78..5..8..36...3.2....6..9.3.539..65..85.4..3.9..3...8.6.8.56....39.6.3.8..  35     17     18      mark        4                  174
1.3.5.78..5..8..36...3.2....6..9.3.539..65..85.4..3.9..3...8.6.8.56....39.6.3.8..  35     17     18   UA hitting   (2)                  174
1.3.5678..5..8..36...3.2....6..9.3.539..65..85.4..3.9..3...8.6.8.56....39.6.3.8..  36     16     20      mark        9          3       353
1.3.5678..5..8..36...3.2....6..9.3.539..65..85.4..3.9..3...8.6.8.56....39.6.3.8..  36     16     20   UA hitting   (2)                  353
1.3.5678..5..8..36...3.2....6..9.31539..65..85.4..3.9..3...8.6.8.56....39.6.3.8..  37     15     22      mark       17          7       690
1.3.5678..5..8..36...3.2....6..9.31539..65..85.4..3.9..3...8.6.8.56....39.6.3.8..  37     15     22   UA hitting   (3)                  690
1.3.5678..5..8..36...3.2....6..9.31539..65..85.4..369..3...8.6.8.56....39.6.3.8..  38     14     24      mark       46         18      1264
1.3.5678..5..8..36...3.2....6..9.31539..65..85.4..369..3...8.6.8.56....39.6.3.8..  38     14     24   UA hitting    11(4)              1264
1.3.5678..5..8..36...3.2....6..9431539..65..85.4..369..3...8.6.8.56....39.6.3.8..  39     11     28      mark      121         66      3712
1.3.5678..5..8..36...3.2....6..9431539..65..85.4..369..3...8.6.8.56....39.6.3.8..  39     11     28   UA hitting    22(13)             3712
1.3.5678..5..8..366..3.2....6..9431539..65..85.4..369..3...8.6.8.56....39.6.3.8..  40      9     31      mark      499        372      8144
1.3.5678..5..8..366..3.2....6..9431539..65..85.4..369..3...8.6.8.56....39.6.3.8..  40      9     31   UA hitting    52(43)             8144
1.3.5678..5..8..366..3.2....6..9431539..65..85.4..369..3...8.6.8.56....39.6.3.82.  41      8     33   UA hitting   198(189)           36756
123.5678..5..8..366..3.2....6..9431539..65..85.4..369..3...8.6.8.56....39.6.3.82.  42      8     34   UA hitting   296(309)           77458
123.5678..5..8..366..3.2.5..6..9431539..65..85.4..369..3...8.6.8.56....39.6.3.82.  43      8     35   UA hitting   424               119439
123.5678..5..8..366..3.2.5..6..94315391.65..85.4..369..3...8.6.8.56....39.6.3.82.  44      7     37   UA hitting  1096               290855
123.5678..5..8..366..3.2.54.6..94315391.65..85.4..369..3...8.6.8.56....39.6.3.82.  45      6     39   UA hitting  3345(3602)        1110409
123.5678..5..8..366..3.2.54.6..94315391.65..8574..369..3...8.6.8.56....39.6.3.82.  46      6     40   UA hitting  6339(6472)        2250531
123.5678..5..8..366..3.2.54.6..94315391.65..8574..369..3...8.6.8.56....39.6.3.827  47      4     43   UA hitting 27551             10130441

Mark method (top down) is optimized for minimal number of solver calls. It doesn't use UA set collection. I tried using parallel code, only when solving, but the scalability is poor.
UA hitting method currently has no parallel code. The secondary timings (in braces) are for smaller UA collection used.
Subsequent runs for UA hitting method differ in speed in +/-10% due to the some random factor in the UA generation process.
dobrichev
2016 Supporter
 
Posts: 1316
Joined: 24 May 2010

Re: Stripping Out Redundant Clues

Postby dobrichev » Thu May 30, 2013 1:29 pm

David P Bird wrote:Thinking about how any more than that could be handled, one idea might be to allow the user to identify digits whose givens should be considered as forced.

Yes, it is technically easy to add such user input.
dobrichev
2016 Supporter
 
Posts: 1316
Joined: 24 May 2010

PreviousNext

Return to General