Friday, March 14, 2014

Bit Gathering and base-2-to-base-3 conversion for Othello

The previous article, Bit Gathering Via Multiplication, showed how to extract bit patterns from 64 bit integers which represent 8x8 board game positions (a.k.a. bitboards). Note, in the chess programming community the technique has been known for quite a while as kindergarten bitboards or magic multiply, but I prefer the term bit gathering.

This write-up presents an optimization to bit gathering which pertains to tri-state games such as Othello/Reversi.

Problem


Leading Othello engines such as NTest or Zebra use pattern tables to statically evaluate a position (see references). For instance, each row, by itself, will produce such a pattern. Symmetrically, every column will also produce a pattern. Diagonals and certain regions around the corners create patterns as well.

Each of these patterns have an associated "goodness" value depending on the stage of the game (certain patterns in fact have very different values in the beginning of the game versus the end of the game). The engines extract all the relevant patterns from the board, and then sum up their corresponding "goodness" values from the pattern database*.

The following figure shows how a column pattern and a diagonal pattern slice through the position (note: this position, with White to move, is a relatively balanced one). Both patterns are bolded. Dots indicate empty positions in the patterns.


B
BB.
. BBWW
BWBWWW
WWBBWB
WBWBB
BBWB
B..

The highlighted secondary diagonal pattern is .WBWW.,with the dot representing an empty space. Similarly, the highlighted column pattern is B.WWWBW.. Because each of the 64 board position in Othello can be Empty (.), White (W), or Black (B), a pattern is essentially a base 3 number, with one digit per board position. A row has 8 positions, thus there are at most 3 to power 8  = 6561 possible patterns. I am saying at most because some patterns are clearly not possible (e.g., a center column, D or E, with the middle positions, 4 or 5, empty). Nonetheless, the pattern space is sufficiently dense that the base 3 representation provides an excellent space/encoding overhead trade-off.

Let's assume that within the pattern database, 0 means empty, 1 White and 2 Black **. The diagonal pattern will be 012110(3) or 1 * 3 ^ 4 + 2 * 3 ^ 3 + 1 * 3 ^ 2 + 1 * 3 ^ 1 .

So how would an engine produce the pattern for the 6-long secondary diagonal?

Let's assume that White and Black are each represented by a bitboard. 

Using the bit gather approach:
  1. Extract the 6 bit quantities from both the Black and White bitboards. Cost:  2 x (1 mask, 1 multiply, 1 shift). This step is described in the previous article.
  2. Convert the base 2 representation to a base 3 representation with a look-up table.

    The table essentially converts abcdefgh(2) to abcdefgh(3), where each of a,b,c,d,e,f,g,h is a binary digit - either 0 or 1.

    E.g., the array entry 27, which is:

    11011(2) or
     2 ^ 4 + 2 ^ 3 + 2 ^ 1 + 2 ^ 0,

    contains

    11011(3), which is:
     3 ^ 4 + 3 ^ 3 + 3 ^ 1 + 3 ^ 0 == 112.

    Cost: 2 loads (from a small table of 512 bytes, which is unlikely to cause caching problems)
  3. Return 2 * base_3_Black + base_3_White : one or two basic operations. On x86, the LEA instruction can compute 2 * A + B in one shot.
To be clear, this is approach is already highly efficient on a modern processor. But, can we do any better?  As it turns out, in some cases we can eliminate the need for the table look-ups altogether.

Folding bit gathering and base 2 to base 3 conversion

Let's take another look at the secondary diagonal that we'd like to extract, gather, and convert to base 3:


a
b
c
d
e
f

Much like the regular bit gather, the first step is a bitwise AND to clean up anything else. We thus end up with a00000000b00000000c00000000d00000000e00000000f00(2). Whereas the bit gather step would produce abcdef(2) via a multiply and shift, this technique will produce directly abcdef(3) via a multiply and shift as well.

If you recall from the first article about bit gathering, multiplication makes transposed (shifted) copies of the first operand wherever the second operand has a bit of 1, and adds all the transposed copies together. To extract and base-2-to-base-3 convert a pattern at the same time, the first operand will be an array of powers of 3. The second operand will be the cleaned-up bit pattern. Because the diagonal bit pattern is made up of bits that are 9 positions apart, the powers of three in the first operand will also be 9 positions apart.

The following are the powers of 3 up to and including 5, in binary:

power decimal binary alias
0 1 1 p
1 3 11 qq
2 9 1001 rrrr
3 27 11011 sssss
4 81 1010001 ttttttt
5 243 11110011 uuuuuuuu
Total:365101101100

Without further ado, the multiplication diagram, which, for simplicity, assumes that all a, b, c, d, e, and f are 1:

                  p0000000qq00000rrrr0000sssss00ttttttt0uuuuuuuu *
                  a00000000b00000000c00000000d00000000e00000000f
                ------------------------------------------------  
                  p0000000qq00000rrrr0000sssss00ttttttt0uuuuuuuu
         p0000000qq00000rrrr0000sssss00ttttttt0uuuuuuuu
  000000qq00000rrrr0000sssss00ttttttt0uuuuuuuu
  0000rrrr0000sssss00ttttttt0uuuuuuuu
  000sssss00ttttttt0uuuuuuuu
  0ttttttt0uuuuuuuu
  --------------------------------------------------------------
  ________XXXXXXXXX_____________________________________________

The highlighted  portion, XXXXXXXXX, shall contain a * uuuuuuuu + b * ttttttt + c * sssss + d * rrrr + e * qq + f * p, or abcdef(3). The maximum result value shall not exceed 365, which can be represented on 9 bits. By placing the powers of 3 with exactly the same "pitch" as the bits in our pattern (i.e., 9 bits), we ensured the XXXXXXXXX area in the result will have all the powers of 3 precisely lined up. Of course, uuuuuuuu, ttttttt, sssss, rrrr, qq, and are just aliases for the powers of 3. The multiplication factor is 0x1018243651.

This technique works as long as there is no carry between the multiplication groups. In other words, the sum of powers of 3 should be representable on the number of bits between two elements (9 bits). It should now be clear why a 6-long NWSE diagonal was chosen. The NWSE direction provides a 9-bit separation between elements. The NESW diagonal provides only 7 bits of separation. Also, if we chose a 7 or 8-long NWSE diagonal, the powers of 3 would exceed 9 bits, as the sum of powers of 3 up to 6 is 1091 (10 bits).

However, for 5-long diagonals the technique works in both directions, as 7 bits are sufficient to sum up all the powers of 3 up to 4.

Acknowledgements

I would like to thank Chris Welty, the main author of NTest, and Won Chun for independently giving me the idea of combining magic multiplies and base 2 to base 3 conversion.

References

Ntest: Main source tree: https://github.com/weltyc/ntest
My clone of ntest's source tree: https://github.com/vladpetric/ntest

Zebra: http://radagast.se/othello/download.html

"Experiments with Multi-ProbCut and a New High-Quality Evaluation Function for Othello" (1997) by Michael Buro describes additive, pattern-based Othello

Notes

* NTest uses a total of 42 patterns: 16 for all the rows and columns (8 + 8), 14 for diagonals (size 5 to size 8), and 12 for corner regions (3 per corner). In addition to summing up pattern values, NTest also uses mobility, potential mobility and parity to statically evaluate a position.

** NTest uses a homologous representation - 0 for the side to move, 1 for empty and 2 for the opposing side. For the purpose of this post, the different convention makes little difference.