No, you want to make a datablock for every possible combonation. I'm saying to make an image for every part, and mount that image to the player's hand when they are using it. If a player has a normal gun with a scope and silencer, they'll have 3 images on their hand. One for the scope, one for the base, and one for the silencer.
I think that would even increase datablock use - making a gun with 3 possible scopes will still need about 7 or 8 datablocks (just for the scope functionally).
EDIT: Here's a detail breakdown of about how may datablocks will be needed for one of his guns and attachments. (assuming he uses around 3 attachments)
Base Gun Item (x1)
Base Gun Image (x1)
Base Gun Zoom Image (x1)
Base Gun Fire Sound (x1)
Scope Images (x3)
Scope Zoom Images (x3)
Silencer Images (x3)
Silenced Fire Sounds (x3)
Stock Images (x3) (assuming that he will use custom scripted recoil, if he doesn't he will need like 6 more datablocks)
TOTAL: 19
Yeah, the general concept is good, but making a major system will use too many datablocks and be impractical.