Time to revive this thread with an addition from my part. I managed to make a more
optimized version of the
binary addition that was inlined into this function.
I've tested it several times on different values, and I did not find any issues with it so far. I even checked against several hashes to make sure that it worked as intended.
Benchmark:
Running sha1("") 999999 times: 1408 ms, or ~1.4 µs per call
Running Port's sha256("") 9999 times: 74384 ms, or ~7.4 ms per call
Running McTwist's sha256("") 9999 times: 44368 ms, or ~4.4 ms per call
This makes it roughly 40% faster than previous version.