I incorporated naked and hidden singles before the brute force search and compared performance with JSolve and here are the results:

TestCase: 10 test cases with Top 1465 Sudokus * 64 = 93760 puzzles each

JSolve : 27765 puzzles/sec

My solver: 1177 puzzles/sec

Looks like my solver is about 25 times slower than JSolve. Not sure if that qualifies for:

No one has yet developed a solver based on templates which is remotely competitive in speed with other approaches.

Regarding considering branch factor as a measure of efficiency as you mention,

Another issue is the percentage of the search tree that you can eliminate at each recursion. For your approach that corresponds to number of potential templates eliminated at each recursion. If there are say 200 possible templates for a digit, your current approach only eliminates one of them at a time or 0.5% of the possible solutions. It is potentially much more efficient if you could somehow eliminate a number of them at one time, ideally something approaching 50% of the possible solutions per recursion. The current cell by cell brute force solvers are able to eliminate a large portion of the search tree at each recursion, which is one of the keys to their efficiency.

I think the number of nodes searched is a better indicator, because if your search tree has less branching factor but more depth, it is equivalent to a shallow tree with large branching factor.

I am now planning to incorporate some sort of pruning technique to make the brute force search faster. Unlike JSolve where brute force search is augmented with singles and intersection strategies, i can't think of a way to leverage these in between my brute force search since number of templates to manage with each step is high. It has to be some other strategy.

Any suggestions/comments/questions are welcome. Posting the source code link again for quick reference:

https://github.com/singaurav/fast-sudoku-solver.