# ZSolver: Core Idea — Band-Centered 27-Integer State Representation and Pipeline-Friendly Loop Unrolling
**Author**: Zhou Yundong
## Abstract
This paper presents the core implementation ideas behind ZSolver. The key lies in a compact, band-centered state representation and a unified elimination framework built upon it. For any given digit, its state within a single band requires only 27 bits to describe. With 9 digits and 3 bands, just 27 integers are sufficient to capture all core state information needed during solving. On top of this representation, intra-band elimination, inter-band elimination, and box elimination can each be compressed into just a few lines of code. Furthermore, by fully unrolling the 27-iteration loop, relative index accesses such as F[i] are transformed into fixed absolute address accesses such as F[5], which makes extremely effective use of the CPU pipeline. This results in a significant speedup from approximately 8 microseconds to just over 4 microseconds. This paper explains this implementation path from three perspectives: state representation, elimination code structure, and execution efficiency.
**Keywords**: Sudoku solving; ZSolver; band representation; bitwise operations; loop unrolling; CPU pipeline; high-performance implementation
## 1. Introduction
The performance of a Sudoku solver is often assumed to depend primarily on its search strategy, pruning rules, and constraint system. In practice, however, the underlying state representation and code organization often have a far greater impact on the final speed. When pursuing microsecond-level performance in particular, whether a program can complete state updates, constraint propagation, and contradiction detection in the shortest possible path matters more than high-level strategy.
ZSolver's core experience demonstrates that when the state representation is compact enough, constraint propagation is unified enough, and the code structure is well-suited to the execution characteristics of modern CPUs, overall performance can improve dramatically. The real key is not to accumulate complex logic, but to compress the core problem into a small number of integers and a small amount of short code — and then to convert that structural advantage into an execution advantage at the processor level.
## 2. Band-Centered State Representation
The central foundation of ZSolver is a band-centered state representation.
For any given digit, its state within a single band can be represented using just 27 bits. Since a full Sudoku board has 9 digits and 3 bands, only 27 integers in total are needed to describe all core state. This representation is extremely compact and, crucially, completely uniform.
The significance here goes beyond simply "saving memory." More importantly:
1. The state structure is identical for every digit;
2. The processing method is identical for every band;
3. All subsequent elimination operations can be built on the same integer representation;
4. Data is highly concentrated, making it easier for compilers to optimize and for CPUs to execute efficiently.
In typical implementations, state information is scattered across rows, columns, boxes, and cells, requiring constant switching between different data views and structures. Under this representation, many originally dispersed relationships are folded into a fixed set of integers, allowing subsequent operations to be performed directly as bitwise computations.
## 3. Simplification of Intra-Band Elimination
On the basis of the above representation, intra-band elimination for any given digit requires only a single line of code.
This is critically important. It means that the primary constraint relationships within a band have been heavily absorbed by the state encoding itself. What the program actually needs to do is no longer a complex traversal with many conditional checks — it is just one very short operation.
This shows that the representation is not passively storing state, but actively participating in the solving process. It is precisely because the representation is so tightly aligned with the problem structure that the elimination logic can be written so concisely.
## 4. Unified Implementation of Inter-Band Elimination
On the same basis, inter-band elimination requires only two or three lines of code.
This shows that the representation is not only suitable for local updates within a single band, but also naturally supports constraint propagation between bands. In other words, once the state representation is established, eliminations at different structural levels no longer require separate large blocks of code — they can all be handled within the same unified framework.
For high-performance programs, this kind of uniformity is extremely valuable. Uniformity means:
1. Shorter code;
2. More stable logic;
3. More predictable execution paths;
4. Easier and more efficient processing by the machine.
## 5. Box Elimination as a Natural Extension
Box elimination, building on what has already been established, also requires only two or three lines of code.
This further demonstrates that this representation does not sacrifice the 3×3 box constraint — the most fundamental structural element of Sudoku — but instead incorporates it naturally into the same framework. As a result, intra-band elimination, inter-band elimination, and box elimination are not three separate implementations, but three direct operations all built on the same underlying state structure.
From an engineering perspective, this inheritance and uniformity is extremely valuable. It allows the program to avoid relying on a large layered rule system, depending instead on a single representation that is correct, compact, and consistent.
## 6. Each Full Elimination Pass Requires Only About Ten Lines of Code
Taking all the elimination types together, the total core code for one complete elimination pass is no more than around ten lines.
This is one of ZSolver's most distinctive characteristics. Its speed is not achieved through "vast amounts of code," but through "high-density execution of a small amount of code." The true complexity is no longer visible on the surface of the code — it has been compressed into the state representation and operational structure.
This short-code approach has two direct benefits:
1. The implementation is clear and does not require many intermediate layers;
2. It is well-suited for subsequent unrolling, scheduling, and low-level optimization.
## 7. Unrolling the 27-Iteration Loop
Given this foundation, fully unrolling the 27-iteration loop produces a notable change in the execution characteristics of the program, even though the total code length does not grow very large.
Before unrolling, the program typically accesses state via expressions like F[i] — an indexed access with a variable subscript.
After unrolling, those accesses become fixed expressions like F[5] — absolute address accesses.
Although this may appear to be merely a change in array index notation, its impact at the execution level is significant. Fixed address access means:
1. No repeated updating of loop variables;
2. No frequent boundary checks;
3. Easier register allocation by the compiler;
4. Greater freedom for instruction scheduling;
5. More direct and predictable data access paths.
This step is not simply about "removing the loop." It is about transforming the core code into a fixed, short instruction sequence that is as close as possible to the ideal execution form for the processor.
## 8. Effective Utilization of the CPU Pipeline
This unrolled implementation makes extremely effective use of the CPU's pipeline execution capability.
Based on practical experience, after adopting this style of loop unrolling and fixed address access, program speed improved significantly from approximately 8 microseconds to just over 4 microseconds. The magnitude of this improvement shows that what determines performance is not only "what the logic does," but also "how the code is executed by the machine."
Once code exhibits the following characteristics:
1. Stable execution paths;
2. Fixed memory access patterns;
3. Very few branches;
4. Tight, repetitive core operations;
the pipeline advantages of modern CPUs can be fully utilized. ZSolver's implementation path is precisely the result of unifying state representation, short code structure, and processor execution characteristics.
## 9. Natural Convergence of the Implementation
Looking at the implementation process as a whole, the initial method provided the direction, and the F[27] representation allowed the entire system to converge into a coherent whole.
Once this 27-integer representation was in place:
1. Intra-band elimination could be written in one line;
2. Inter-band elimination could be written in two or three lines;
3. Box elimination could likewise be written in two or three lines;
4. All eliminations combined amounted to only about ten lines of core code;
5. After 27-iteration unrolling, accesses became absolute addresses;
6. The result naturally became a pipeline-friendly, high-performance implementation.
ZSolver's core is therefore not any single isolated technique, but a continuous engineering path: first compress the state, then unify the operations, then unroll the core code, and finally convert the algorithmic advantage into an execution advantage at the machine level.
## 10. Conclusion
ZSolver's core ideas can be summarized as follows: represent all core state using 27 integers organized around bands; complete constraint propagation using extremely short and uniform elimination code; and, through 27-iteration loop unrolling, organize the program into an execution form that is highly friendly to the CPU pipeline.
This implementation path demonstrates that in Sudoku solving and similar constraint-satisfaction problems, the true performance ceiling is determined not only by the solving strategy itself, but also by whether the state representation is compact enough, whether the elimination code is uniform enough, and whether the implementation form is sufficiently aligned with the working characteristics of modern processors. ZSolver's performance improvement is the direct result of this unified design.
**Note**
ZSolver's public source code is available in the software section of www.enjoysoduku.com. The publicly released version is named "3.88-microsecond solver."

