I tested around a bit and had a setup that worked, until I started testing with bigger packets and suddenly, the checksums wouldn’t match anymore. As it turns out the fact that the library worked in the first place was by chance: It returns a signed 32 bit integer and my test setup in the beginning simply produced a checksum that didn’t have the sign bit set. In parallel, I verified the results with two tools: The Ruby zlib bindings (part of the stdlib) and the crc32 command line tool that comes with OS X. Both return unsigned integers.
So I asked the maintainer of the library if this behaviour is intentional and it turns out, yes, it is.
For CRC32, it’s the resulting bitfield that matters, so the sign is not really important. But there’s a good reason of simply keeping the number within the bounds of 32 bit signed integers: Performance. V8 (and probably other engines as well) keeps numbers as signed 32bit integers if you let it and this let’s it optimize a lot of calculations. Given that so many operators specifically return signed ints, this makes sense, I guess.
The fix for my app was simple. Turns out that there is one bitwise operator that returns 32 bit unsigned integers. It’s called the unsigned shift right and looks like this:
>>> and works exactly like it’s signed counterpart,
>>, but fills up with 0 bits instead of sign bits. If you have never seen that before: Welcome to the club. Here’s the fix:
crc32(stuff) >>> 0. Looks and feels ridiculous, but works.
My short interaction with the library maintainer also revealed a rather interesting tidbit: As I mentioned earlier, the zlib bindings of ruby return the checksum as an unsigned value. The reason is probably that the bindings cast the result to the standard int type (Fixnum) in ruby, which, on my machine, is a 64 bit value. If I would be on a 32 bit platform (i haven’t tested it), I would assume that it would have been casted to a BigNum, because that’s how Ruby usually handles numbers. Python 2.x, on the other hand, as mentioned by the library maintainer, has a less strict behaviour. Before 2.6, zlib.crc32 would return signed or unsigned values, depending on your platform. After 2.6, it always returns a signed 32 bit integer and in the 3.x series, it always returns an unsigned integer. The docs state that you should use
crc32 & 0xffffffff to get consistent behaviour over all platforms. Sure.
It is one thing to know about the 32 bit unsigned optimisations in V8 and a completely different thing to correctly design your code to stay within the bounds and prevent performance degradation over a large codebase.
As you can tell, this is not a well thought out analysis on additional benefits of type safety, it is more of a snapshot on my current thoughts on this but I wanted to share it anyway, not the least in the spirit of my “you need to write more in 2016” initiative. Please let me know what you think.
I couldn’t find a good post head for this article in my own collection of photos, so I’ve looked on flickr, for the first time in years. You can find the original by Marco Ooi here, which is licensed under a BY-SA creative commons license, which in turn means that the post head image is shared by me under the same license as well.