i have spent my afternoon tonight working on making a variable length integer encoding

the standard zig-zag varint scheme uses the 8th bit to indicate a further byte is part of the number, and it goes backwards, which confuses me

it took me ages to figure it out, but i have a similar encoding that instead of the extra bit being the sign of continue, instead it means the end, and the number sequence is reversed

so each byte in the buffer represents the smallest to largest places of what is effectively a base 128 encoding

when it finds a number that is bigger than that, with the 8th bit set, it removes the 8th bit and then stops and the sum of each base 128 digit multiplied by the power of the place, produces the result

i'm not sure why the standard method is done the exact opposite because it seems to me like this is simpler

to encode, i just AND the last 7 bits of each byte, then divide the number by 128, and once that gives a zero, the result of the last operation has 128 added

to decode, i first put the first byte value in the return value, and then each subsequent value i multiply by 128 (bitshift left by 7 bits) and then add that to the result, and when it finds a value over 128, it subtracts 128 and does the same operation and terminates

gotta formalize it into an actual algorithm now, but it's mostly written

i'm obsessed with encoding data and i wanted an encoding that can be read one byte at a time to scan the length values, without needing to have a second buffer to decode the variable integer numbers used for the length prefixes of fields

this is how i enjoy myself. fuck you, math is awesome.

Reply to this note

Please Login to reply.

Discussion

Makes sense, but, how do you handle negative integers without significant overhead when compared with zigzag?