Why would my quotes be meaningless if they were from a lot of different threads? I'm confused.
because your sample is inconsistent.
You can't decide an average per thread if you've sampled from a kajillion.
It's like if you were testing the efficiency of a product, and then you just suddenly changed the ingredients in the middle of the test without recording it.