A comment I have seen in quite a few copy-pasted scripts mentions that the image state data is sent to the client.
There is a function to set the image state, set if it has ammo, and set if it is loaded (probably if a clip is inserted).
Therefore, if the script changes the image state or sets the ammo/loaded immediately before a state that plays sound, the clients won't be notified of the change until (latency) ms later, thus allowing it to start playing the wrong state/sounds for a moment. A script that branches based on ammo and loaded, but sets the states as early as possible can comunicate the change to the clients before they start playing the wrong state.