The New Sudoku Players' Forum

by **JasonLion** » Sat Jan 30, 2016 7:30 pm

Good catch zhouyundong_2012! There are five other instances of very similar issues (shifts followed by comparing to a constant) in the Update routine. Fixing all of them speeds things up by less than 1/2 of a percent.

by **Afmob** » Sun Jan 31, 2016 9:12 am

Guess I must be one of the few people for whom JCZSolve is actually slower (by about 6-7% on the subset method) than ZSolve (Compiler: TDM/MinGW 4.7.0, CPU: Core i5 2400, Win 7).

On another note, I tried adding a routine to InitSudoku to remove candidates (useful for checking minimality) like here. The code addition I use for ZSolve is:

Code: Select all: B = TblMult3[num-1] + TblBoard_Block[field]; %num from 1..9, field from 0...80 F[B] &= ~(1 << (field%27));

I tried porting this to JCZSolve but with no success. Do you've got any hints on how to do this?

by **zhouyundong_2012** » Sun Jan 31, 2016 1:29 pm

Hi Jason,
Why do you use T1[v >> 9] .. T2[v >> 18] .. T3[v >> 27] to calculate Shrink?
I think to cut v to 2 sections will be more effecient, although the table is large. see it in my code, T1[the high 15 bits of v]..T2[the low 15 bits of v]

Maybe we should make it the same way to calculate Shrink when we test speed.

by **JasonLion** » Sun Jan 31, 2016 3:35 pm

Afmob,

Speed depends rather a lot on what kinds of puzzles you are trying to solve. JCZSolve scarifies some speed on very "easy" puzzles in order to detect invalid puzzles more quickly. There are other related but more difficult to describe tradeoffs as well.

I believe that the code you want is:

Code: Select all: int band = digitToBaseBand[field] + cellToSubBand[field]; g->bands[band] &= ~TblSelfMask[field];

by **JasonLion** » Sun Jan 31, 2016 3:57 pm

zhouyundong_2012,

Using two lookups into a larger shrink table uses fewer instructions, but results in more cache misses. Cache misses are expensive, and almost exactly cancel out the advantages of fewer instructions. Given they are about the same speed, using the smaller table saves memory and leaves more of the cache available to whatever other code is running that is calling the solver.

by **Afmob** » Sun Jan 31, 2016 4:58 pm

Thanks for the help! Though the code should probably read:

Code: Select all: int band = digitToBaseBand[zahl-1] + cellToSubBand[field]; g->bands[band] &= ~(1 << mod27[field]);

Still slower than ZSolve.

by **JasonLion** » Sun Jan 31, 2016 5:13 pm

Afmob,

I was wrong when I said TblSelfMask. Still, a table lookup is faster and there is already a suitable table to use. cellToMask is the one you actually want. Of course your code will work and is only trivially slower.

by **Afmob** » Sun Jan 31, 2016 6:00 pm

You're right about "cellToMask". I missed that table. You can also use it to speed up the "SetSolvedDigit" routine by replacing:

Code: Select all: g->unsolvedRows[rowBit/27] &= ~(1<<(mod27[rowBit]));

with

Code: Select all: g->unsolvedRows[rowBit/27] &= ~cellToMask[rowBit];

Maybe one should also use a lookup table for the division?

Edit: Note that this makes the "mod27" table useless.

by **JasonLion** » Sun Jan 31, 2016 6:36 pm

Division is often a hair better than a table lookup, though it can come out either way (depending on instruction scheduling and cache pressure).

by **Afmob** » Sun Jan 31, 2016 7:04 pm

I see. Last question from me today regarding the "UPDN(I,J,K,L)" routine:

Maybe the compiler already does this but isn't it faster to replace every "I*3" multiplication just with "I" by calling "UPDN(3*I,J,K,L)"? E.g. for digit 2a: "UPDN(6, 0, 1, 2)" instead of "UPDN(2, 0, 1, 2)"?

by **JasonLion** » Sun Jan 31, 2016 7:27 pm

The entire I*3+x calculation happens at compile time. Actually it is even slightly better than that, as the implied *4 for subscripting the array also gets folded into the compile time optimization of that expression. Just about every compiler is very good at calculating constant expressions at compile time.

The form of the arguments to UPDN() were chosen for maximum readability.

by **champagne** » Mon Feb 01, 2016 2:25 am

JasonLion wrote:zhouyundong_2012,

Using two lookups into a larger shrink table uses fewer instructions, but results in more cache misses. Cache misses are expensive, and almost exactly cancel out the advantages of fewer instructions. Given they are about the same speed, using the smaller table saves memory and leaves more of the cache available to whatever other code is running that is calling the solver.

Hi Jason,

To go in your direction, there are several parameters on which I am not clear at all.

Is it really worth to use tables instead of some instructions (mod 27 for example)
Is it better to have a [512] table defined as integer (32 bits) or to use smaller items as you can do in many of the tables (skrink, shrinkshrink ...

As did zhouyundong_2012 I wrote the code neglecting the cache problem, in the mood that a table load or index goes faster than some arithmetic intructions and that modern computers have a relatively big cache size, but ...

I am not expert in computer design and performances, so I can have made stupid things and the final result is surely affected by the hardware (cache size, bus speed...)

by **JasonLion** » Mon Feb 01, 2016 3:38 am

Not everything is perfectly tuned in JCZSolve 1.0. Don't be surprised if some of my comments are not reflected in the code. There will be an update soon when I have gone through and done some more speed nitpicking and minor touch ups. But all of the touch ups are going to make perhaps a 1% difference, 2% at most, so not a huge deal. I stopped finding large improvements a while before the release, and despite trying several things, I can't see any more large improvements to make. I will leave that to the rest of you

With current processors, a multiply is always faster than a table lookup, a divide can go either way (faster or slower), while a modulo is almost always slower than a table lookup. The big wins are when a table lookup can replace several operations in one lookup. Memory access typically costs several simple arithmetic operations. If statements (conditional branches) can often be quite bad, depending on how well they fit the branch prediction routines in the CPU. Making tables smaller by using char instead of int is sometimes worth it, at least when the values all fit in a char. You lower the cache pressure, but usually need to pay for an extend byte to dword instruction. We only have low to medium cache pressure, so sometimes one way, sometimes the other. Memory access is least expensive when you have some register only instructions that can overlap the memory reference delay, most expensive when memory accesses come in tight succession and all depend in some way on the previous accesses result.

Things get weird sometimes. Two days ago I found four adjacent lines of code that could simply be removed, with no other changes. However, removing those four lines slowed the program down! At the other extreme, the loop unrolling I did in ApplySingleOrEmptyCells() sped things up by more than 1%, despite doing essentially the same logical steps. Instructions can run out of order and overlap each other when the conditions are right, which makes predicting this kind of thing tricky.

You almost always make the big differences by improving the algorithm so it is doing significantly less, rather than the kind of code nitpicking I have been talking about here. Using cells with two pencil marks for guessing saves a noticeable percentage of the passes through the Update() procedure, which is one of the core reasons that JCZSolve is faster than ZSolve.

by **champagne** » Sun Apr 03, 2016 9:15 am

I am running currently a process where the final step is a brute force limited to 2 bands.

The brute force is included in another process, so it's not easy to give accurate figures, however, the solving speed seems to be around 150 000 grids per second.

This is part of the 27 + X + Y search and in that sample X=5 and Y=3.
The file is biased in the sense that the band 3 is always valid in specific conditions.

The final test has about 5% grids not valid and the rest made of grids with multiple solutions. (the count of valid grids is close to nil)
The sample has 28x10^9 grids.

Here the brute force is in line with the above process, but "optimized" for a 2 bands

Edit : done with an i7 3.4 Ghz

by **zhouyundong_2012** » Fri Sep 02, 2016 11:04 am

Open source:
#ifndef __Object_h__
#define __Object_h__

#include <boost/function.hpp>

//鼠标部分行为回调
#define MP_MOUSEENTER 0
#define MP_MOUSELEAVE 1
#define MP_LCLICK 2
#define MP_RCLICK 3
#define MP_MCLICK 4
#define MP_MOUSEPICK 5
#define MP_MOUSEDRAG 6
#define MP_MOUSEDROP 7
#define MP_MAX 8

//鼠标支持的扩展行为
#define MA_LDBLCLK 0x00000001
#define MA_RDBLCLK 0x00000002
#define MA_KEEPLDOWN 0x00000004
#define MA_KEEPRDOWN 0x00000008
#define MA_MOUSEWHEEL 0x00000010

//在父窗口的居中对齐方式
#define ADC_X_DIRECTION 0x01
#define ADC_Y_DIRECTION 0x02

class CObject;
typedef boost::function<void(CObject *, UINT nFlags, CPoint point)> FN_MouseProc;

class CImage;
class CObject : public CWater {
public:
CObject();
CObject(CPoint xDrawPos, CSize xSize, int nDepth);
virtual ~CObject();

public:
virtual void Rebuild();
virtual void SetObjId(int nObjId);
virtual int GetObjId() const;
virtual CPoint GetOffset() const;
virtual CPoint GetDrawPos() const;
virtual void SetSize(CSize xSize);
virtual void SetSize(int cx, int cy);
virtual CSize GetSize() const;
virtual void GetRect(CRect *rect) const;
virtual BOOL PtIn(CPoint point);
virtual void SetDepth(int nDepth);
virtual int GetDepth() const;
virtual void SetCease(bool bCease = true);
virtual void Show(bool bShow = true);
virtual bool IsShow() const;
virtual void SetHome(CObject *pHome);
virtual CObject *GetHome() const;
virtual void SetInteract(bool bInteract);
virtual bool IsInteract() const;
void RevDraw(CImage *pImage);
void RevDraw(CImage *pImage, const CRect& rect);
void EnableAction(int nMouseAction);
bool IsEnableAction(int nMouseAction) const;
void TrackMouseEvent(int nEventTrack);
void SetMouseProc(int nType, FN_MouseProc MouseProc);
FN_MouseProc GetMouseProc(int nType) const;

public:
virtual void SetDrawPos(CPoint xDrawPos);
virtual void SetDrawPos(int x, int y);
virtual void Offset(CPoint point);
virtual void Offset(int x, int y);
virtual void _UpdateOffset(CPoint point);
virtual void AdjustAnchor(CPoint point, const CRect& rect);
virtual void AdjustCenter(const CRect& rect, byte byFlag);
virtual void SetImage(CImage *pImage);
virtual CImage *GetImage() const;
virtual void SetSrcPos(CPoint xSrcPos);
virtual void SetSrcPos(int x, int y);
virtual CPoint GetSrcPos() const;
virtual void Draw(CImage *pImage);
virtual void Draw(CImage *pImage, const CRect& rect);

public:
virtual bool IsExist() const;
virtual CObject *FindObject(const CPoint& point);
virtual CObject *FindWheel(const CPoint& point);
virtual CObject *FindExcept(const CPoint& point, CObject *pExcept);

public:
virtual void OnLoseFocus();
virtual void OnMouseMove(UINT nFlags, CPoint point);
virtual void OnMouseWheel(UINT nFlags, short zDelta, CPoint point);
virtual bool OnMouseEnter(UINT nFlags, CPoint point);
virtual bool OnMouseLeave(UINT nFlags, CPoint point);
virtual void OnLButtonDown(UINT nFlags, CPoint point);
virtual void OnLButtonUp(UINT nFlags, CPoint point);
virtual void OnRButtonDown(UINT nFlags, CPoint point);
virtual void OnRButtonUp(UINT nFlags, CPoint point);
virtual void OnMButtonDown(UINT nFlags, CPoint point);
virtual void OnMButtonUp(UINT nFlags, CPoint point);
virtual void OnLClick(UINT nFlags, CPoint point);
virtual void OnRClick(UINT nFlags, CPoint point);
virtual void OnMClick(UINT nFlags, CPoint point);
virtual void OnLDblClk(UINT nFlags, CPoint point);
virtual void OnRDblClk(UINT nFlags, CPoint point);
virtual void OnMDblClk(UINT nFlags, CPoint point);
virtual void OnTrackLeave(UINT nFlags, CPoint point);
virtual void OnTrackHover(UINT nFlags, CPoint point);

public:
virtual void OnDragInteract(CObject *pObject);
virtual void OnDropInteract(CObject *pObject);

public:
virtual void SetDragMode(byte byDragMode);
virtual byte GetDragMode() const;
virtual bool CheckDragMode(UINT nFlags);
virtual void SetDrawMode(byte byDrawMode);
virtual byte GetDrawMode() const;
virtual bool CheckDrawMode(UINT nFlags);

public:
virtual int GetDrawThrs();
virtual void OnDrawRect(UINT nFlags, const CRect& rect);
virtual void OnDrawOver(UINT nFlags, const CRect& rect);

public:
virtual int GetDragThrs() const;
virtual void OnMousePick(UINT nFlags, CPoint point);
virtual void OnMouseDrag(UINT nFlags, CPoint point);
virtual void OnMouseDrop(UINT nFlags, CPoint point);

protected:
int m_nObjId;
CPoint m_xOffset;
CPoint m_xDrawPos;
CSize m_xSize;
int m_nDepth;
bool m_bCease : 1;
bool m_bShow : 1;
bool m_bExist : 1;
bool m_bInteract : 1;
byte m_byDragMode;
byte m_byDrawMode;
int m_nMouseAction;
int m_nEventTrack;
FN_MouseProc OnMouseProc[MP_MAX];

protected:
static CPoint DragOffset;

public:
CObject *m_pNext;
CObject *m_pHome;
CObject *m_pParent;
};

inline void CObject::SetObjId(int nObjId)
{
m_nObjId = nObjId;
}

inline int CObject::GetObjId() const
{
return m_nObjId;
}

inline void CObject::SetDrawPos(CPoint xDrawPos)
{
SetDrawPos(xDrawPos.x, xDrawPos.y);
}

inline void CObject::Offset(CPoint point)
{
Offset(point.x, point.y);
}

inline CPoint CObject::GetOffset() const
{
return m_xOffset;
}

inline CPoint CObject::GetDrawPos() const
{
return m_xDrawPos;
}

inline void CObject::SetSize(CSize xSize)
{
SetSize(xSize.cx, xSize.cy);
}

inline CSize CObject::GetSize() const
{
return m_xSize;
}

inline void CObject::SetImage(CImage *pImage)
{
}

inline CImage* CObject::GetImage() const
{
return NULL;
}

inline void CObject::SetSrcPos(CPoint xSrcPos)
{
}

inline void CObject::SetSrcPos(int x, int y)
{
}

inline CPoint CObject::GetSrcPos() const
{
return CPoint(0, 0);
}

inline void CObject::GetRect(CRect *rect) const
{
rect->left = m_xOffset.x + m_xDrawPos.x;
rect->top = m_xOffset.y + m_xDrawPos.y;
rect->right = rect->left + m_xSize.cx;
rect->bottom = rect->top + m_xSize.cy;
}

inline BOOL CObject::PtIn(CPoint point)
{
CRect rect;
GetRect(&rect);
return rect.PtInRect(point);
}

inline void CObject::SetDepth(int nDepth)
{
m_nDepth = nDepth;
}

inline int CObject::GetDepth() const
{
return m_nDepth;
}

inline void CObject::SetCease(bool bCease)
{
m_bCease = bCease;
}

inline bool CObject::IsShow() const
{
return m_bShow;
}

inline void CObject::SetHome(CObject *pHome)
{
m_pHome = pHome;
}

inline CObject *CObject::GetHome() const
{
return m_pHome;
}

inline void CObject::SetInteract(bool bInteract)
{
m_bInteract = bInteract;
}

inline bool CObject::IsInteract() const
{
return m_bInteract;
}

inline void CObject::SetMouseProc(int nType, FN_MouseProc MouseProc)
{
OnMouseProc[nType] = MouseProc;
}

inline FN_MouseProc CObject::GetMouseProc(int nType) const
{
return OnMouseProc[nType];
}

inline bool CObject::IsExist() const
{
return m_bShow;
}

inline void CObject::_UpdateOffset(CPoint point)
{
m_xOffset = point;
}

inline void CObject::SetDrawMode(byte byDrawMode)
{
m_byDrawMode = byDrawMode;
}

inline byte CObject::GetDrawMode() const
{
return m_byDrawMode;
}

inline bool CObject::CheckDrawMode(UINT nFlags)
{
return m_byDrawMode == nFlags;
}

inline void CObject::SetDragMode(byte byDragMode)
{
m_byDragMode = byDragMode;
}

inline byte CObject::GetDragMode() const
{
return m_byDragMode;
}

inline bool CObject::CheckDragMode(UINT nFlags)
{
return m_byDragMode == nFlags;
}

inline void CObject::EnableAction(int nMouseAction)
{
m_nMouseAction |= nMouseAction;
}

inline bool CObject::IsEnableAction(int nMouseAction) const
{
return (m_nMouseAction & nMouseAction) != 0;
}

inline void CObject::TrackMouseEvent(int nEventTrack)
{
m_nEventTrack = nEventTrack;
}

#endif

The New Sudoku Players' Forum

3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Removing candidates

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)