Wish I didn't have to triple post here...
An update on everything.
I've made a bunch more progress with things, fixed all incorrect results and added in two remainder methods, one for remainder to a constant and remainder from any number.
The regular, two argument one takes around 20ms to calculate, and that probably isn't going to get much faster unless I really improve on multiplication somehow.
However, the one argument one only takes 7ms. This is because it is taking the modulus of a number to a constant. In this context that means that it uses a precomputed array of multiplications, 1-9999 to be exact, to eliminate the need for multiplication altogether.
The precomputing only takes a couple of seconds, and at over twice the speed it's definitely worth it for our purposes. I'll also be making multiplication by a constant, which will be used as well.
After all that is done, I can start implementing ECC.
For the record, multiplying two RSA sized numbers (2048 bit) takes a third of a second rather than 10ms. That won't work. We need to use ECC otherwise it's infeasible.