it's a simpler implementation, the implementation is not material to the encoding except for that first word and 1 bit out of 8 for a simpler encoding that runs an iterative word-wise division and multiplication series is simpler shorter code
i thought about drawing 256 bits in a line and creating a scheme to encode the bitshifts required for each group of bytes but - it's just more complex and error prone, it will be a longer piece of code compared to a big integer division and multiplication function
mainly, it requires a another table, which i estimate would be about 96 bytes long to hard-encode all the decisions required to do it by bit shifting
because our divisor is a power of two, the CPU is also optimizing this heavily as it knows that it's a shift of 11 in either direction